mhalushka / miRge3.0

Comprehensive analysis of small RNA sequencing data
MIT License
30 stars 11 forks source link

Mapped.csv miR does not equel to miR.Counts.csv #22

Closed ray1919 closed 3 years ago

ray1919 commented 3 years ago

I run miRge3.0 on 4 small datasets. I found the miR.Counts.csv results does not equel to mapped.csv results, as showed below miRge

the count num in the last sample is not the sum of counts in mapped results, while the other three samples are.

Then I look into other miR results, the problem seems to be widespread.

How to explain this difference?

ray1919 commented 3 years ago

It seems that --crThreshold parameter do deal with this problem. I wonder how to determine its prefered value.

arunhpatil commented 3 years ago

Hi @ray1919,

You are right about --crThreshold, you can increase that up to 0.5 (default set to 0.1), however, this won't make much difference, as we would anticipate inaccurate counts if we do not account for canonical miRNA reads. As you may see in the last sample (NWZ_0661) the reads for miR-3200-3p maps to isomiRs (column: isomiR miRNA) and none to the canonical miRNAs (column: exact miRNA).

These low depth read counts introduce erroneous counts which in the biological space could be the result of erroneous bases (sequencing error) at the ends of the reads or the presence of low abundant reads for miR-3200-3p. This could also be the result of the incorrect adapter sequence used while trimming. Furthermore, we have set minimum reads to call exact miRNA (canonical miRNA) is at least 2 reads (This can't be adjusted).

To summarize, we must have at least two reads to canonical (exact miRNAs) to account for miR.Counts and miR.RPM values. Further, the threshold value works only if we have both canonical and isomiRs reads. I hope this is clear. Please let us know if you need any further information.

Thank you, Arun.