mortazavilab / swan_vis

A Python library to visualize and analyze long-read transcriptomes
https://freese.gitbook.io/swan/
MIT License
54 stars 11 forks source link

Missing transcripts #14

Closed rsalz closed 2 years ago

rsalz commented 2 years ago

It seems that some transcripts that are present in 'all_talon_abundance_filtered.tsv' and 'all_talon_observedOnly.gtf' are not in the swangraph object when I upload them. How could this be? Is there an additional filtering that filters out transcripts while I add them?

More specifically- I have 6,070 of 12,812 novel transcripts remaining after I upload them into the swan object.

I'm using the standard sg.add_transcriptome(talon_db, pass_list=pass_list) and sg.add_abundance(ab_file) as you suggest in the tutorial

fairliereese commented 2 years ago

Hi there, Swan's add_transcriptome() function has an argument include_isms whose default value is False. (Documented here: https://freese.gitbook.io/swan/code-documentation/swangraph). Internally our group typically does not analyze ISM (incomplete splice match) transcripts as they are both numerous and dubious. If you wish to keep them in your analyses please just use include_isms=True when you're initializing your SwanGraph!

rsalz commented 2 years ago

*EDIT: figured out I should be looking at 'path' not 'loc_path', my bad!! Thanks!

fairliereese commented 2 years ago

*EDIT: figured out I should be looking at 'path' not 'loc_path', my bad!! Thanks!

I had a feeling this was the case! Good detective work!