williamritchie / IRFinder

Detecting intron retention from RNA-Seq experiments
53 stars 25 forks source link

Problem on IRratio calculation and coverage #150

Open LeihuanHuang opened 3 years ago

LeihuanHuang commented 3 years ago

Hi,

I have been using IRFinder-1.3.1 to calculate IR with my bam files and it works well. And I learned from your published paper and documentaion that IRratio = IntronDepth/(max(splices right ,splices left) + IntronDepth). But as I examin the output, I find that when IntronDepth <1, the output IRratio = Coverage/(max(splices right ,splices left) + Coverage), i.e. Coverage is used instead of IntronDepth to calculate IRratio. Below I post the original IRratio from the output and I recomputed this value using IntronDepth or Coverage. When IntronDepth >=1 or IntronDepth = Coverage(rows highlighted in green), IRratio calculated using IntronDepth matches the output IRratio. While IntronDepth <1(rows highlighted in red), the output IRratio matches IRratio calculated using Coverage. I wonder if this is a bug or not? Another question is, as IntronDepth is the median depth of the intronic region without the excluded regions, is Coverage ratio of bases with mapped reads across the entire intronic region or the intronic region without the excluded regions? Because I don't understand why the IntronDepth is not always 0 when the Coverage is below 0.5. Thanks in advance for you reply! image

dg520 commented 3 years ago

@LeihuanHuang I do appreciate your effort in exploring the details behind IR ratio calculation.
Coverage is calculated before taking out excluded regions. The denominator is the original length of an intron, NOT its effective length (i.e. without excluded bases).
If IntronDepth < 1, we use IRratio = Coverage/(max(splices right ,splices left) + Coverage) on purpose. When IntronDepth is extremely low, this is usually due to intronic reads are sparsely distributed (e.g. few reads overlap each other). Thus, using the coverage itself seems to be a better option to reflect the summarized aggregation of IR. I personally agree it might not be the best solution here. I would also argue some filtering steps could be taken to get rid of low IR candidates before further analysis.

LeihuanHuang commented 3 years ago

Thanks Dadi, your answer solves my puzzle!