using DESeq2 with IRFinder

jessh1 commented 6 years ago

Hello, Thank you for work on IRFinder. I have been attempting to use DESeq2 as described in the manual to test for significance of intron retention between two groups of samples (multiple replicates). I seem to be able to do this fine when only including ~Condition + Condition:IRFinder in the design(dds) model. However I have a couple of covariates that i would like to account for and have attempted to change the model accordingly. When I do this, I have issues with the model not being able to converge.

23 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

Attempting to increase the "maxit" number does not seem to help this case (still doesn't converge). I have attempted to "clean" the resulting data using:

ddsClean <- dds[which(mcols(dds)$betaConv),]

and then continuing with the comparisons. But am unsure if this is the correct approach. Do you have any suggestions as to how to work around this issue? Many thanks, Jessica

dg520 commented 6 years ago

Hi Jessica,

It is totally normal that a model doesn't fit for all introns. The more complicated the model, the more likely some introns will not fit. You approach make perfectly sense.

Please note, DESeq2 applies the same model to all introns, the parameters of the model is estimated per intron. That says, we assume all introns follows the same "regulation" (i.e. model) rules. The assumption itself is arguably true.

When the model cannot converge, it is usually not because of lacking of iterations. The default iteration number is already high enough. That probably means the data are quite away from the regression of the model.

23 introns is really a small number, considering the total number of introns in the transcriptome. I personally won't worry about that. If you're concerned about the correctness of the model, there are many ways to judge the fitness of the model. Please refer to some linear regression books.

Best, Dadi

jessh1 commented 6 years ago

Thank you, Dadi. This is very helpful!

One last question- is there a recommended way to filter out introns for those with low coverage? For example, would you recommend back referencing intron names to the individual sample "IRFinder-IR-dir.txt" files for the coverage calculation or is it possible to use the baseMean calculation in the final "res.diff" calculations out of the GLM?

Thank you again, Jessica

dg520 commented 6 years ago

Hi Jessica,

In my experience, I usually filter on both individual IR report from IRFinder-IR-dir.txt and differential IR calculated by DESeq2 approach. My setting is as following (totally personal, just to give you some new thoughts):

For each IRFinder-IR-dir.txt file, I filter on: Column 8th (percentage of intron region covered by RNAseq reads): >=0.7; Column 19th (number of "correct" splicing that splices out the intron): >=10 or >=5 depending on RNAseq depth; Column 21st (quality control of RNAseq reads that support the current intron): keep ones with the mark - or NonUniformIntronCover

This gives high quality introns for each sample. I would only use introns that pass the thresholds across all samples in the downstream differential IR analysis. For each valid intron, I average IR ratio indicated by Column 20th across a certain condition (e.g. control group or treatment group) and use it later.

For differential IR analysis: p.adj < 0.05 Average IR ratio change between two conditions >= 0.1.

I hope this makes sense for you.

Best, Dadi

williamritchie / IRFinder

using DESeq2 with IRFinder #38