rajewsky-lab / mirdeep2

Discovering known and novel miRNAs from small RNA sequencing data
GNU General Public License v3.0
135 stars 49 forks source link

quantification of expression on novel miRNA detected #120

Closed moxgreen closed 3 months ago

moxgreen commented 3 months ago

I have some doubts about miRDeep2, which I am using for the first time. I have not found any problem in the code, the issue is more on the interpretation of the results. I'm not sure that this is the correct place to post such kind of message, excuse me if it is not.

I am analyzing data from Citrus Sinensis, which is a not well studied species, so it seems important to also conduct discovery of novel miRNAs, and in fact, it finds quite a few of them.

Doubt 1: The estimated probability that a novel miRNA detected is a true positive is not monotonous with the score. For example, the top scoring one has a score of 3.8e+3 and a probability of only 0.17 ± 0.14, while I find some with a score of 1.6 and a probability of 0.47 ± 0.07. How should these data be filtered?

Doubt 2: In the documentation and tutorials, I never see anyone doing expression quantification in 2 steps. They always run miRDeep2.pl only once, but in the "miRNA_expresses_all_sample" file, only miRNAs from miRBase are quantified (it seems). Wouldn't it be necessary to re-run the quantification also with the added novel miRNAs? Or maybe in my case, the novel miRNAs are all poor quality, and that's why they are not considered in the quantification?

As reference attach the result file miRDeep2.pdf

https://bioinformatics.stackexchange.com/questions/22530/quantification-of-novel-mirna-expression-in-mirdeep2

mschilli87 commented 3 months ago

I can chime in on doubt 2: I typically predict novel miRNAs using all samples pooled for maximum sensitivity, followed by filtering (novel by score/probablilties, known by abunance) and a second run of the quantifier per sample to have comparable abundance estimates across miRNAs. Regarding doubt 1, maybe @Drmirdeep can add some more information. I know that miRDeep2 was developed with animal miRNAs in mind but that should not affect you rather technical concern.

Drmirdeep commented 3 months ago

Have a look at the methods described in the papers and what each step is doing. Especially on how the score is composed and the probabilities are calculated.

The quantifier.pl will run on anything that you input.