rajewsky-lab / mirdeep2

Discovering known and novel miRNAs from small RNA sequencing data
GNU General Public License v3.0
135 stars 49 forks source link

negative repeat count does nothing at quantifier.pl line 1312, line 5603. #126

Closed erickfabian0 closed 2 weeks ago

erickfabian0 commented 2 weeks ago

I have a problem whit quantifier.pl in line 1312. One of such lines is below: negative repeat count does nothing at quantifier.pl line 1312, line 5603.

Line 1312 of my miRDeep2_core_algorithm is: my $mat = $hash{$k1}{$k2}{'mature'};;

could you please look into it? thanks!

btw, I’m working on a server (Alliance Canada), and I need to install miRDeep2 using the second method, which doesn't require Perl, because I don't have sudo access on the server. I followed the tutorial, and it worked fine, but for my database, I need to change the format of the identifiers, replacing spaces with underscores ("_").

First, I align the reference genome with a miRBase database using Bowtie, and then I use the mapper tool to obtain the reads_collapsed.fa and reads_collapsed_vs_genome.fa files. Finally, I execute the miRDeep2 command using a mature comparison with the Pan paniscus mature sequence (ppa).

here is an example of the head of every fiel i used:

reference genome: >dna:chromosome:GRCh38:1:1:248956422:1_REF NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

reads.fa (G010_04011_clean_total.fa/After delete the spaces: output.fa ) : >A00253_627_HMLMGDRXY_1_2101_4128_1000_1_N_0_CACCATAC GGCTGGTCCGATGGTAGTGGGTTATCAGAACT

A00253_627_HMLMGDRXY_1_2101_5195_1000_1_N_0_CACCTAAC GGCTGGTCCGATGGTAGTGGGTTATCAGAACT A00253_627_HMLMGDRXY_1_2101_7310_1000_1_N_0_CACCTTAC GGCTGGTCCGATGGTAGTGGGTTATCAGAACTTATTA A00253_627_HMLMGDRXY_1_2101_7618_1000_1_N_0_CACCTTAC GGCTGGTCCGATGGTAGTGGGTTATCAGAACT A00253_627_HMLMGDRXY_1_2101_8486_1000_1_N_0_CACCTTAC GGCTGGTCCGATGGTAGTGGGTTATCAGAAC A00253_627_HMLMGDRXY_1_2101_9299_1000_1_N_0_CACCTTAC GGCTGGTCCGATGGTAGTGGGTTATCAGAAC

mature_fi_clean.fa: >hsa-let-7a-5p_MIMAT0000062_Homo_sapiens_let-7a-5p UGAGGUAGUAGGUUGUAUAGUU hsa-let-7a-3p_MIMAT0004481_Homo_sapiens_let-7a-3p CUAUACAAUCUACUGUCUUUC hsa-let-7a-2-3p_MIMAT0010195_Homo_sapiens_let-7a-2-3p CUGUACAGCCUCCUAGCUUUCC hsa-let-7b-5p_MIMAT0000063_Homo_sapiens_let-7b-5p UGAGGUAGUAGGUUGUGUGGUU hsa-let-7b-3p_MIMAT0004482_Homo_sapiens_let-7b-3p CUAUACAACCUACUGCCUUCCC hsa-let-7c-5p_MIMAT0000064_Homo_sapiens_let-7c-5p UGAGGUAGUAGGUUGUAUGGUU

hairpin_fi_clean.fa : GNU nano 7.2 hairpin_fi_clean.fa >hsa-let-7a-1_MI0000060_Homo_sapiens_let-7a-1_stem-loop UGGGAUGAGGUAGUAGGUUGUAUAGUUUUAGGGUCACACCCACCACUGGGAGAUAACUAU hsa-let-7a-2_MI0000061_Homo_sapiens_let-7a-2_stem-loop AGGUUGAGGUAGUAGGUUGUAUAGUUUAGAAUUACAUCAAGGGAGAUAACUGUACAGCCU hsa-let-7a-3_MI0000062_Homo_sapiens_let-7a-3_stem-loop GGGUGAGGUAGUAGGUUGUAUAGUUUGGGGCUCUGCCCUGCUAUGGGAUAACUAUACAAU hsa-let-7b_MI0000063_Homo_sapiens_let-7b_stem-loop CGGGGUGAGGUAGUAGGUUGUGUGGUUUCAGGGCAGUGAUGUUGCCCCUCGGAAGAUAAC hsa-let-7c_MI0000064_Homo_sapiens_let-7c_stem-loop GCAUCCGGGUUGAGGUAGUAGGUUGUAUGGUUUAGAGUUACACCCUGGGAGUUAACUGUA hsa-let-7d_MI0000065_Homo_sapiens_let-7d_stem-loop CCUAGGAAGAGGUAGUAGGUUGCAUAGUUUUAGGGCAGGGAUUUUGCCCACAAGGAGGUA

mature_pp_fi.fa : >ppa-mir-141_MI0002490 UGGCCGGCCCUGGGUCCAUCUUCCAGUACAGUGUUGGAUGGUCUAAUUGUGAAGCUCCUAACACUGUCUGGUAAAGAUGGCCCCCGGGUGGGUUC ppa-mir-15b_MI0002493 UUGAGGCCUUAAAGUACUGUAGCAGCACAUCAUGGUUUACAUGCUACAGUCAAGAUGCGAAUCAUUAUUUGCUGCUCUAGAAAUUUAAGGAAAUUCAU ppa-mir-23b_MI0002501 CUCAGGUGCUCUGGCUGCUUGGGUUCCUGGCAUGCUGAUUUGUGACUUAAGAUUAAAAUCACAUUGCCAGGGAUUACCACGCAACCACGACCUUGGC ppa-mir-30b_MI0002508 ACCAAGUUUCAGUUCAUGUAAACAUCCUACACUCAGCUGUAAUACAUGGAUUGGCUGGGAGGUGGAUGUUUACUUCAGCUGACUUGGA ppa-mir-125b_MI0002511 UGCGCUCCUCUCAGUCCCUGAGACCCUAACUUGUGAUGUUUACCGUUUAAAUCCACGGGUUAGGCUCUUGGGAGCUGCGAGUCGU ppa-mir-128_MI0002523 UGAGCUGUUGGAUUCGGGGCCGUAGCACUGUCUGAGAGGUUUACAUUUCUCACAGUGAACCGGUCUCUUUUUCAGCUGCUUC ppa-mir-130a_MI0002527 UGCUGCUGGCCAGAGCUCUUUUCACAUUGUGCUACUGUCUGCACCUGUCACUAGCAGUGCAAUGUUAAAAGGGCAUUGGCCGUGUAGUG ppa-mir-133a_MI0002530 ACAAUGCUUUGCUAGAGCUGGUAAAAUGGAACCAAAUCGCCUCUUCAAUGGAUUUGGUCCCCUUCAACCAGCUGUAGCUAUGCAUUGA ppa-mir-135-1_MI0002539 AGGCCUCGCUGUUCUCUAUGGCUUUUUAUUCCUAUGUGAUUCUACUGCUCACUCAUAUAGGGAUUGGAGCCGUGGCGCACGGCGGGGACG

Drmirdeep commented 2 weeks ago

This message occurs since your IDs are simply too long and not even close to the IDs that are in the tutorial files. (Please have look).

The length of the IDs is used to determine the output arrangement in the PDF and .mrd files since some spacer is introduced between ID and score. The maximum ID length should not exceed 40 (which is never a problem if proper IDs are used).

This happened because you just replaced whitespace with underscores. However, it is simpler to download the miRNA files from mirbase and simply use the included extract_miRNAs.pl script.

E.g. extract_miRNAs.pl mature.fa hsa > hsa_mature.fa Same for the precursors.