Some questions about output files

happypiggyzjx commented 1 week ago

Hi Woody, Me again!（： I recently scrutinized all the result files and came across the following 2 questions:

Is the PSIsigma_r10_ir3.txt file filtered from the PSIsigma.db file? If so, could you please tell me the exact filtering principle? (e.g. based on which value, what is the threshold set to, greater than or less than, etc.)
I'm not quite sure about the relationship between the values in the “Event Type” column, according to my understanding, the label NMD only means the function of the event, it doesn't mean that it is a novel event outside of the annotation (in other words, I'd like to ask about the specific relationship between NMD and novel). I searched for other questions and answers and learned that the Ex. prefix stands for novel, now my aim is to distinguish between novel (events outside the annotation file) and non-novel (events within the annotation file), how should I tell the difference?

I look forward to your professional reply, it is very important for my current research~

Best, happypiggy

happypiggyzjx commented 1 week ago

Recently I did comparative research, can you tell me if the filtering criteria for the PSIsigma.db file to the PSIsigma_r10_ir3.txt file are the columns N and T? Because I found that in the PSIsigma_r10_ir3.txt file, as I understand it, the ones that can be used as filtering conditions are N, T, ΔPSI, pval, FDR, and the PSI of each repetition (N/T Value column). But there is no sign of these columns values being filtered in the file, the only possibility is that I have defined the N/T columns according to your official website, and the reads number is not enough to support it will be filtered, am I understanding it correctly?

wososa commented 5 days ago

Hi @happypiggyzjx ,

PSIsigma.db is the database created by using .gtf and .SJ.out.tab files. It documents all possible splicing events. PSIsigma_r10_ir3.txt reports only the splicing events whose sequencing coverage is >10x. In other words, r10 in the file name means read coverage 10. When --fmode 3 is used, there will be no additional filtering. The sequencing coverage value is represented by the denominator value. Every splicing event has their denominator value documented in the denominator.gct table.
The Event Type column shows whether the transcript containing the target exon is annotated as NMD in the .gtf file. NMD stands for nonsense mediated decay, and it is sometimes useful to know this information. The annotation is not necessarily accurate because one exon can be used by multiple transcripts and PSI-Sigma doesn't predict whether all the transcripts can be a NMD target. An novel event is different from NMD event because an novel event has undocumented splicing junctions.

Let me know if you have further questions.

Best, Woody

happypiggyzjx commented 52 minutes ago

Hi Woody,

Thank you, those explanations are detailed enough for me at the moment! I really appreciate it . Have a wonderful day~(⊙ v ⊙)

Best, happypiggy

wososa / PSI-Sigma

Some questions about output files #68