rajewsky-lab / mirdeep2

Discovering known and novel miRNAs from small RNA sequencing data
GNU General Public License v3.0
141 stars 49 forks source link

No output when running quantifier with MirGeneDB seqs #115

Closed iammrtza closed 1 year ago

iammrtza commented 1 year ago

Hi dear,

I am working on miRNA quantification using the quantifier.pl. My requirement is to perform quantification using mature and precursor sequences (in fasta format) obtained from MirGeneDB. Here's the command I used:

quantifier.pl -p pre_miRNAs.fa -m mature_miRNAs.fa -r reads.fa

Unfortunately, the output I obtained was not as expected. Specifically, I received the following messages:

0 mature mappings to precursors

and

Warning: 0 mature sequences mapped to any of your given precursor sequences

The resulting output file named miRNAs_expressed_all_samples_1691413314.csv turned out to be empty despite having mapped reads to the provided miRNA sequences. Interestingly, I confirmed that mature sequences do exist within the precursor sequences by using the grep function in bash.

I would greatly appreciate any assistance in resolving this issue. Thank you kindly.

mschilli87 commented 1 year ago

Could you please share the first few lines of all three FASTA files involved?

iammrtza commented 1 year ago

Hi dear,

Thank you for your reply; here they are:

reads.fa:

>HR1_0_x728189
GGAAGAGCACACGTCTGAACTCCAGTCACGATCAGATCTCGTATGC
>HR1_1_x687242
GGAAGAGCACACGTCTGAACTCCAGTCACGATCAGATCTCGTT
>HR1_2_x448464
CTCAGGATGGCGGAGCGGTCT
>HR1_3_x431476
CTTTTAGCTGGGGTTGTAGGACAGC
>HR1_4_x269506
GCTTTTAGCTGGGGTTGTAGGACAGC

mature:

>Ovu-Bantam_3p
TGAGATCATTGTGAAAACTGGTT
>Ovu-Let-7_5p
TGAGGTAGTAGGTTGTATAGTT
>Ovu-Mir-1_3p
TGGAATGTAAAGAAGTATGTTC
>Ovu-Mir-2-o20-v1_3p
TCACAGCCAGCTTTGATGAGCC
>Ovu-Mir-2-o20-v2_3p
TATCACAGCCAGCTTTGATGAGCC

pre:

>Ovu-Bantam_pre
CTGGTTTTCATAATGATTTTGCAGAATGTGTCATGTTTCTGAGATCATTGTGAAAACTGGTT
>Ovu-Let-7_pre
TGAGGTAGTAGGTTGTATAGTTAAGAAATACACCATTTCAAGGAGAACTGTACAACCTTCTAGCTTTCC
>Ovu-Mir-1_pre
ACATTCTTCTTTACTATCTCATAGATTTACTCCAAGTATGGAATGTAAAGAAGTATGTTC
>Ovu-Mir-2-o20-v1_pre
GCATCAATGCTGGATGTCATAGTAAATCTATAGGGCCTATCACAGCCAGCTTTGATGAGCC
>Ovu-Mir-2-o20-v2_pre
GCATCAATGCTGGATGTCATAGTAAATCTATAGGGCCTATCACAGCCAGCTTTGATGAGCC
Drmirdeep commented 1 year ago

Please have a look at the arf file and check if reads map to the mature and star positions exactly on their 5 prime ends. It is not enough if reads map to the precursor or mature somehow.

iammrtza commented 1 year ago

Thank you for your comment. I have looked at the arf file and as I do not provide star sequences, the arf file only contains mapping results to the precursors for example:

HR1_285_x4320   21  1   21  tggacggagaactgataaggg   ovu-mir-184-P27_pre 21  38  58  tggacggagaactgataaggg   +   0   mmmmmmmmmmmmmmmmmmmmm

HR1_285_x4320 mapps to the 3p (mature) site of miR-184, and when I checked the mature sequence of this miRNA, it is also inside the precursor sequence (below are the sequences).

>HR1_285_x4320
TGGACGGAGAACTGATAAGGG
>Ovu-Mir-184-P27_pre
CCTTGTCACTTACTCGTCTAGTCTGTCAAATAAGAACTGGACGGAGAACTGATAAGGGC
>Ovu-Mir-184-P27_3p
TGGACGGAGAACTGATAAGGGC

Thus I do not know what is the problem. I will appreciate any further help, thank you.

Drmirdeep commented 1 year ago

Ok, the issue seems to be that your precursor id is having the _pre in the end which is not matching of course the mature id. Just remove them and it will work.

Ovu-Mir-184-P27_pre needs to be Ovu-Mir-184-P27

I think it is written somewhere that these need to match (apart from 5p and 3p endings in the mature ids) but I dont know anymore where it was.

However, the id matching is necessary since some mature sequences would map to a different precursor with the same mature sequence though.

You could also use option -k instead to do less stringent mature precursor mapping but this is at your own risk. Other side effects may show up

iammrtza commented 1 year ago

Thank you for the suggestion, it helped and now the script works as expected.