Closed ganguvamshi closed 5 years ago
Not recommended. I assume you mean Ribo-depletion libraries. The rationale of using polyA library is to ensure the RNA molecules detected are mature, so that we have high confident that the remaining introns in the RNA are due to intron retention (IR). In ribo-depletion libraries, it's hard to distinguish true IR from introns in nascent transcription/processing.
Thank you very much for your reply! Could you please advice me on the threshold/cut-off for determining significance from differential IR analysis (using GLM from IRFinder)? Is the p-value < 0.05 or p-value <0.01 cut-off enough? Using test runs on few poly-A enriched datasets (with about 50 million reads), I observed there are very, very few introns (less than 20) that are significant with p-adjusted value. Do you have any idea on why is this very low?
There is no gold standard in p value/FDR cutoff. It's all about how much Type I error you want to tolerant. Actually I would say all cutoffs in bioinformatics should be case specific. Practically, DESeq2 uses a FDR cutoff at 0.1 for differential gene expression (DGE). And keep in mind another two things: 1) GLM method requires a solid estimation of variance/dispersion. With small number of samples, it is hard or the estimation can be biased. 2) 50M reads are sufficient to carry out DGE analysis. However, we're talking about introns here, which are way less covered in a library. You really should filter out introns with low sequencing depth. This also influences the total number of test you have to do and consequently influences the FDR.
Thanks for the explanation. I understand from the paper and your explanation that the intron depth should also be considered along with the differential p-values from DeSeq2. I actually tried implementing the following thresholds: p<0.05, splice depth about 4 (extracted from the quantification file column 19) and ir ratio 0.1 in at least one sample. However, for many datasets (with good coverage) explored so far I end up with very few introns significant using above thresholds. And end up with almost nothing using FDR 0.1. Does this mean in general the introns retention patterns in these datasets are not biologically important, since there are very less high-confidence IRs unlike DEGs?
Please note we hadn't implemented GLM-based method for differential IR in the paper, which is relatively stringent on small sample size. Instead, we applied Audic test. And we were working on samples with distinct cell morphologies. There we observed ~80 differential IRs. Our conclusion is, IR signature, like gene expression, is a cellular feature under strict regulation.
Your observation of few differential IR can be either as expected or intriguing, depending on what the exact comparison you've set up. Is it between different cell types? Or is it between WT and mutations (and what kind of mutation)?
Before you jump into statistical test, ask yourself do you really expect significant IR changes in terms of biology. E.g. IR is a splicing related event, which is most likely controlled by splicing factors and epigenetics. Are you in that scenario? If you believe so, then you can do some quick checks by extracting IR values (from IRFinder report) across samples. You can average IR values in both your conditions, calculate the difference and rank them. You can check what's the range of IR changes, which gene contains the most dramatic change and etc. Do these result fit your hypothesis? If not, we might need to change your hypothesis. BTW, splice depth about 4 is way to low IMO.
Please remember the purpose of this site is to report bugs, instead of discussing biology. Feel free to send me direct emails if your question is NOT about IRFinder crash and I'm more than happy to help (my email is at the end of the frontpage of IRFinder Wiki) . If you don't mind, I'll close this report for now.
could you let me know if a non-polyA library could be used for intron retention detection?