rajewsky-lab / mirdeep2

Discovering known and novel miRNAs from small RNA sequencing data
GNU General Public License v3.0
141 stars 49 forks source link

Minimun read length in miRDeep2 main script #101

Closed JoseCorCab closed 2 years ago

JoseCorCab commented 2 years ago

Dear all, I have been using your tool executing the main script. Until recently I’ve had no problems with the hardcoded minimum read length threshold of 17 in sanity_check_reads_ready_file.pl because most of the reads were higher than this value. However, I am currently looking for miRNA in some sequencing samples where most of the reads are below this 17 value, so miRDeep2 uses a small percentage of them. I think that being able to change this threshold through the command line would be an interesting improvement for this script.

Drmirdeep commented 2 years ago

There are many interesting improvements that could be done. However, as long as the code works and there is no bug it will not lead to a change of the source. If you need 17 nt you can apparently easily change it in your installation files. I would be really worried if most of my reads are only 17 nt long except for the case where the species under investigation is known to have mature miRNAs of only 17nt in length.

JoseCorCab commented 2 years ago

Dear @Drmirdeep, thank you for your answer. We are quite concerned about the small size of the reads (the libraries are form mice and human). Because of this we are trying to confirm that the problem is not the software but in the sequencing library preparation. We appreciate your advice and we will take your opinion very seriously. We are using miRDeep2 to detect any possible miRNA in our samples (we are not too concerned with discovery quality because we apply other downstream filters) and we've seen miRDeep behavior we don't fully understand. We launch miRDeep2 and we keep all miRNAs detected with a significant randfold p-value and no Rfam alert. Using these low size reads libraries, we have executed miRDeep2 using different minimum read length thresholds: 18, 17 (both available in your last miRDeep version), 15 and 10 (We had to modify local miRDeep code for applying this). When we reduced the threshold, more reads were mapped, so miRNAs detected in all 4 executions were supported by more reads and the miRDeep score increased (as expected). However, although more reads were mapped, miRDeep2 discovered a smaller amount of miRNAs. It appears as though the additional reads are actually reducing the confidence in the miRNAs. Is this an expected behavior of miRDeep? If so, could you explain why? I can give you more information of the execution if you need.