mritchielab / FLAMES

A framework for performing single-cell and bulk read full-length analysis of mutations and splicing.
https://mritchielab.github.io/FLAMES/
GNU General Public License v3.0
20 stars 9 forks source link

Isoform category quantification #32

Closed lscdanson closed 6 months ago

lscdanson commented 6 months ago

Hi I'm wondering how I could quantify the isoform structural categories shown in Fig2B in your paper. I could only find three feature metadata slots (i.e. 'transcript_id', 'gene_id', 'FSM_match') in the SCE object generated from the sc_long_pipeline() function and there doesn't seem to be anything relevant. Thanks so much!

ChangqingW commented 6 months ago

This was done with SQANTI as per described in the "Isoform classification and splicing analysis" subsection under Methods.

lscdanson commented 6 months ago

@ChangqingW thanks for the prompt reply! A further question regarding transcript id to transcript name conversion. It looks like some of the transcript ids are artificially created (e.g. "ENSG00000001629.10_92245974_92294977_1") – are these novel transcripts in the ensembl database? And if there is a way to convert the transcript ids (given some are artificially created) to transcript names (e.g. CD4-201)? Many thanks!

ChangqingW commented 6 months ago

The ones with ENSG ids with extra under scores are novel isoforms identified by FLAMES, you won't find the them in public annotation databases because they are novel. I am not sure what you mean by converting to transcript names, they won't have one.

lscdanson commented 6 months ago

Thanks for the clarification!