rajewsky-lab / mirdeep2

Discovering known and novel miRNAs from small RNA sequencing data
GNU General Public License v3.0
135 stars 49 forks source link

miRDeep2 throws errors due to having faulty input files or a faulty installation #91

Closed mickey-spongebob closed 2 years ago

mickey-spongebob commented 2 years ago

Hi 👋 ,

Hope this finds you well! I am writing to ask about help regarding my current miRDeep2 analysis. The installation seems to work fine as all commands are working accordingly, including the initial tests, and all the required software are also installed correctly. When I run the following scripts:

1. build the index

bowtie-build genome.fa genome

2. process and map reads to reference

mapper.pl smallRNAs.fq -e -h -m -l 18 -p genome \ -s collapsed.fa -t collapsed_genome.arf -v -o 100

3. predict miRNA

miRDeep2.pl collapsed.fa genome.fa collapsed_genome.arf \ none none none 2>report.log

Steps 1 and 2 work fine, with no errors and all the output files that one should get exist. However, step 3 gives an error:

started: 16:40:39 ESC[1;31mError: ESC[0mproblem with collapsed.fa

Strangely, when I do check the file using 'sanity_check_reads_ready_file.pl collapsed.fa', it doesn't give an error.

Any ideas on what I could be doing wrong? I apologise if this is redundant or something silly, but I can't seem to figure it out :-(

Any advice would be kindly appreciated and thank you for the really nice tool!

Best, kevin

mschilli87 commented 2 years ago

@mickey-spongebob:

Hi Kevin,

Could you try if you can reproduce this with the first 10 lines of collapsed.fa?

Best, Marcel

mickey-spongebob commented 2 years ago

Hi Marcel,

Thanks for the reply. I've tried it with the first 10 and 20 lines of collapsed.fa, and it still isn't working :-( I get the exact same error as above.

Here is the first 10 lines of the 'collapsed.fa' file:

seq_0_x11922267 TGACTAGATCCACACTCATCC seq_11922267_x10117130 AACTCTGAGCGGTGGATCACTCGGCTCGTGCGTCGATGAAGAGCGCAGCCAGCTGCGAGAAGTGATGTGAAT seq_22039397_x9348851 AATGGCACTGGTAGAATTCACGG seq_31388248_x4446651 TGGAATGTAAAGAAGTATGTAG seq_35834899_x2568266 CAACTCTGAGCGGTGGATCACTCGGCTCGTGCGTCGATGAAGAGCGCAGCCAGCTGCGAGAAGTGATGTGAA

Could it be in how I'm pre-processing my reads? I can summarise and say that we receive "*fq.gz" files from the small RNA sequencing, I first concatenate all read files equating to approx. 94GB, I remove the adapters using AdapterRemoval https://adapterremoval.readthedocs.io/en/stable/. I then remove an extra 4 base pairs that were used for barcoding and this leads to the 'smallRNAs.fq' file I described above. Then running the bowtie-build and mapping goes seemingly fine, and only the prediction fails :-(

Sorry for all the trouble and thank you once more for the response :-) Any more advice would be super helpful :-)

Best, kevin

mschilli87 commented 2 years ago

On first look this seems fine to me. Except for the missing > for the FASTA header lines but my guess is thiese just got mixed up with markdown quotations?

What genome are you running this on. If I can reproduce your genome.fa and collapsed_genome.arf I could try running miRDeep2.pl collapsed.fa genome.fa collapsed_genome.arf none none none with collapsed.fa containing

>seq_0_x11922267
TGACTAGATCCACACTCATCC
>seq_11922267_x10117130
AACTCTGAGCGGTGGATCACTCGGCTCGTGCGTCGATGAAGAGCGCAGCCAGCTGCGAGAAGTGATGTGAAT
>seq_22039397_x9348851
AATGGCACTGGTAGAATTCACGG
>seq_31388248_x4446651
TGGAATGTAAAGAAGTATGTAG
>seq_35834899_x2568266
CAACTCTGAGCGGTGGATCACTCGGCTCGTGCGTCGATGAAGAGCGCAGCCAGCTGCGAGAAGTGATGTGAA

myself. Without reproducing you problem, I'll have a hard time helping any further.

Also, can you confirm that the tutorial included with miRDeep2 runs fine using your installation? And could you pipe the first 10 lines through od -c and check if there are any unusual non-printable characters? For example, if you see \r\n line breaks instead of simple \n ones, dos2unix might fix your issue.

Drmirdeep commented 2 years ago

Looks the things you posted are either not from a mirdeep2 installation from the GitHub repo or you copy pasted just some lines and left out some others.

However, I don't get any errors using the '5' reads from above.

If it's not a secret then please post the full screen output here including the command you are using to call miRDeep2.

mickey-spongebob commented 2 years ago

Hi

On first look this seems fine to me. Except for the missing > for the FASTA header lines but my guess is thiese just got mixed up with markdown quotations?

Yup, the files actually do have the '>' so it is a mix up with the markdown :-)

What genome are you running this on. If I can reproduce your genome.fa and collapsed_genome.arf I could try running miRDeep2.pl collapsed.fa genome.fa collapsed_genome.arf none none none with collapsed.fa containing

I'm running this on a Platynereis dumerilii genome for which we have recently assembled and are annotating :-)

>seq_0_x11922267
TGACTAGATCCACACTCATCC
>seq_11922267_x10117130
AACTCTGAGCGGTGGATCACTCGGCTCGTGCGTCGATGAAGAGCGCAGCCAGCTGCGAGAAGTGATGTGAAT
>seq_22039397_x9348851
AATGGCACTGGTAGAATTCACGG
>seq_31388248_x4446651
TGGAATGTAAAGAAGTATGTAG
>seq_35834899_x2568266
CAACTCTGAGCGGTGGATCACTCGGCTCGTGCGTCGATGAAGAGCGCAGCCAGCTGCGAGAAGTGATGTGAA

myself. Without reproducing you problem, I'll have a hard time helping any further.

Also, can you confirm that the tutorial included with miRDeep2 runs fine using your installation? And could you pipe the first 10 lines through od -c and check if there are any unusual non-printable characters? For example, if you see \r\n line breaks instead of simple \n ones, dos2unix might fix your issue.

I just checked the installation via using the tutorial dataset, and it indeed states - "Error: problem with mature_ref_this_species.fa". I tried running 'sanity_check_mature_ref.pl mature_ref_this_species.fa', and it doesn't output an error, which is confusing.

I've so far tried two installations 1, the Conda version - which I quickly aborted due to several errors, and then 2, the one installed on our local cluster, which I assumed worked well but perhaps I should ask them to re-install the software.

Maybe I'll get them to re-install before I bother you again on this issue so I'll close it for now and re-open it if the issue persists :-)

Thanks you for your patience and I'll let you know how it went!

Best, kevin

mickey-spongebob commented 2 years ago

Looks the things you posted are either not from a mirdeep2 installation from the GitHub repo or you copy pasted just some lines and left out some others.

However, I don't get any errors using the '5' reads from above.

If it's not a secret then please post the full screen output here including the command you are using to call miRDeep2.

Not a secret, here is the output, and the exact commands are as follows:

Load miRDeep2 software

module load miRDeep2/0.1.3-foss-2019b-Python-3.7.4

build the index

bowtie-build pdumv2.fa pdumv2

process and map reads to reference

mapper.pl pdum_trim_clip_smallRNAs.fq -e -h -i -j -m -l 18 -p pdumv2 \ -s pdum_all_filt_collapsed.fa -t pdum_all_collapsed_genome.arf -v

predict miRNA

miRDeep2.pl pdum_all_filt_collapsed.fa pdumv2.fa pdum_all_collapsed_genome.arf \ none cte.fas none 2>report.log

Here is the report.log output:

Starting miRDeep2

/g/easybuild/x86_64/CentOS/7/rome/software/miRDeep2/0.1.3-foss-2019b-Python-3.7.4/bin/miRDeep2.pl pdum_all_filt_collapsed.fa pdumv2.fa pdum_all_collapsed_genome.arf none cte.fas none

miRDeep2 started at 14:13:28

mkdir mirdeep_runs/run_23_11_2021_t_14_13_28

testing input files

started: 14:14:17 sanity_check_mature_ref.pl cte.fas

ESC[1;31mError: ESC[0mproblem with cte.fas

But as I mentioned above, I will re-install myself and then get back to the thread, should the issue persist :-)

If it's ok with everyone, I shall close this thread and keep you updated :-)

Thank you for the help :-)

mickey-spongebob commented 2 years ago

Hi all,

So I just confirmed that both two installations via Conda and another one which I can't comment on installed on the cluster did not work. However, the installation onto a local computer (also works on a virtual machine with enough RAM and storage) is working just fine.

Sorry for the trouble and thank you @mschilli87 and @Drmirdeep for the patience and help :-)

Re-closing again :-)

Best, kevin