One species' two genomes get quiet different results

lisaixi commented 2 years ago

Dear Doctor Reza Hammond,

Sorry to bother you, preferring to using your outstanding tool, but I get a issue when using miRador. Using one same miRNA-seq data with two different reference genome data, I get two unidentical results: the number of miRNAs identified here can differ as many as 20 (141 vs 121)! To test whether there are any raw reads in the genome which gets less identification, I directly search one miRNA in the raw reads file, and surprisingly find more than 900 matches. Moreover, I successfully mapped this miRNA to the two reference genomes I used. Hence, I presume that this issue is not caused by the difference of the using genomes.

So I'd like to ask for your assistance. WHY could two genomes generate so many differences in the result, especially in situation that the NOT-identified miRNA matching in the genome reads. Looking forward to your reply and thank you very much!

Yours sincerely, Saixi Li

rkweku commented 2 years ago

Hello Saixi,

Thank you for reaching out and utilizing miRador. This behavior actually doesn't sound unexpected to me, but allow me to explain.

For an sRNA to be identified as a candidate miRNA, is must meet several requirements, one of which is highly dependent on the genome. To summarize miRador's steps, it first identifies potential precursor miRNAs by predicted inverted repeats within the reference genome without ever looking at your reads files. This means that before we even begin, miRador's initial set of potential precursor miRNAs is entirely dependent on the genome. miRador then maps the reads to the genome and identifies which of the precursor miRNAs look like potential miRNA genes by identifying potential miRNA:miRNA duplexes within the set of inverted repeats. miRador then continues to identify that the alignment between the miRNA and miRNA sequence as well as their abundances relative to other reads mapping to the precursor are within bounds according to our defined rules (as defined in this paper: https://academic.oup.com/plcell/article/30/2/272/6099069).

The genome that you are utilizing is effectively the underlying map that we utilize to identify miRNA genes. These maps will differ greatly across species, but they will also differ with different genome versions of the same species. I'm assuming your variance is across two different genome versions? Are all 121 in one genome also identified in the other where 141 were identified? I would be concerned if there was little overlap between the two sets of miRNAs identified with two different genome versions, but I would suspect that there is significant overlap in the predicted miRNAs where the underlying species is the same, but the genome version differs.

I hope this helps add some clarity, but please let me know if I could explain better.

Best, Reza Hammond

lisaixi commented 2 years ago

Dear Dr. Reza,

Thanks for the reply. Your explaining is greatly appreciated. My used genomes actually belong to two different cultivars of the same species. There is distinct overlap between the identified miRNAs as you speculate: 120 of the 126(not 121, sorry for my fault) can be found in the 141. But regrettably, the unpredicted in the 120 are exactly what we are studying.

Following your advice, I decided to directly blast the sequence of one precursor miRNA with the two genomes. Result is that I do successfully map it to the genome which failed to predict: 100% [identification]() and 2e-16 e value. Therefore, I suspect miRador is supposed to regard this miRNA as one of the candidates.

So, can we communicate directly via email in the future? My e-mail address is 1849881996@qq.com. Thank you very much for your patient explanation.

Yours, Saixi Li

rkweku / miRador

One species' two genomes get quiet different results #8