mortazavilab / swan_vis

A Python library to visualize and analyze long-read transcriptomes
https://freese.gitbook.io/swan/
MIT License
54 stars 11 forks source link

Novelty info not found #9

Closed SziKayLeung closed 3 years ago

SziKayLeung commented 3 years ago

Hello Fairlie,

Thank you for the swan package - love the visualisations! I just had a question regarding the novel splice junctions visualisation, as the warning message "Novelty info not found for x. Transcripts without novelty information will be labelled "Undefined" appears every time I add a dataset.

I am using the mouse reference genome gtf as annotation gtf and my Iso-Seq gtf output from SQANTI as the dataset input (bypassing TALON).

Is the novelty aspect documented in the gtf or is there a specific file I need to include? Any guidance on this will be greatly appreciated!

Thank you, Szi Kay

fairliereese commented 3 years ago

Hey, thanks for using swan and I'm glad you've found it useful!

This message isn't anything to worry about. It just pops up when you are adding data to the SwanGraph that doesn't have novelty information found when your SwanGraph already does have novelty information (which is added by default when you add the annotation, as all transcripts in the annotation should be "Known").

Currently, Swan only supports adding novelty information via TALON databases or GTFs. I would be happy to work with you to add functionality to add novelty information via SQANTI GTFs though.

An additional workaround would be to create a tsv/csv of transcript ID, novelty category and merge with the SwanGraph.t_df object. I included some rough code below to use this workaround:

import swan_vis as swan
import pandas as pd

sg = swan.SwanGraph()
sg.add_annotation('my_annotation.gtf')
sg.add_dataset('my_sqanti_transcriptome.gtf')

# drop the existing, uninformative novelty category
sg.t_df.drop('novelty', axis=1, inplace=True)

# read novelty CSV with format transcript_id,novelty_category
novelty_df = pd.read_csv('my_sqanti_novelty.csv', names=['tid', 'novelty'])

# merge novelty in with SwanGraph
sg.t_df = sg.t_df.merge(novelty_df, how='left', left_index=True, right_on='tid')

Let me know if this answers your question, if the code works for you, or if you would like to help me implement importing novelty categories directly from SQANTI GTFs!

SziKayLeung commented 3 years ago

Hello @fairliereese,

Thank you for the prompt reply - the code works really well I have successfully annotated the novelty aspect from SQANTI. I will be happy to work with you to add the functionality via SQANTI GTFs, though I must admit my my experience with python is limited.

fairliereese commented 3 years ago

Ah I was mostly wondering if you could provide me a sample SQANTI GTF that I could have to refer to. If you'd like, feel free to email me one at freese@uci.edu!