rajewsky-lab / mirdeep2

Discovering known and novel miRNAs from small RNA sequencing data
GNU General Public License v3.0
135 stars 49 forks source link

miRDeep2.pl error when loading reference genome #89

Closed ISonets closed 2 years ago

ISonets commented 2 years ago

Greetings! I have encountered an error when tyring to use miRDeep2.pl. I am tyring to discover miRNAs in cucumber. I successfully did the mapping and obtained reads_collapsed.fa and .arf files, but when starrting mirdeep this error occured:

miRDeep2 started at 12:8:41

mkdir mirdeep_runs/run_08_11_2021_t_12_08_41

readline() on closed filehandle IN at /aludisk/home/isonets/miniconda3/bin/miRDeep2.pl line 333.
Use of uninitialized value $line in scalar chomp at /aludisk/home/isonets/miniconda3/bin/miRDeep2.pl line 334.
Use of uninitialized value $line in pattern match (m//) at /aludisk/home/isonets/miniconda3/bin/miRDeep2.pl line 335.
^[[1;31mError: ^[[0mThe first line of file /Cucumber_ref_genome/cuc_ref_genome.fna does not start with '>identifier'
Genome file /Cucumber_ref_genome/cuc_ref_genome.fna is not a fasta file

I am using this assembly with no modifications whatsoever: https://www.ncbi.nlm.nih.gov/assembly/GCF_000004075.2/ I grepped '>' and all of them are present, so fasta file is correct, but this error occurs again and again. Pls help me fix this problem. P.S. Mirdeep2 was installed via conda and the latest version was installed.

mschilli87 commented 2 years ago

Could you please install mirdeep2 via one of the officially supported ways mentioned in the README, using the latest master branch branch on github as a base and see if you can reproduce this? Also, please ensure that you can successfully run the tutorial provided alongside mirdeep2 to rule out any issues with you setup.

Finally, once you have ruled out an installation issue, please share the exact command line you use to run miredeep2 and copy/paste the output of

head -n2 cuc_ref_genome.fna | tee >(od -c)

so we can check the actual format used for your input data.

ISonets commented 2 years ago

Got it, will do ASAP to find out the issue.

Drmirdeep commented 2 years ago

Please also be aware that miRDeep2 is for animal miRNA prediction only. Plant miRNAs have different secondary structures and will thus likely not be predicted properly at all.

ISonets commented 2 years ago

Yes, I know that mirdeep2 isn't very suitable for plants prediction, but for sake of testing and curiosity it will do the job(also it should be noted that your tool is supporting and updating. Many other tools died in vain) Guess I have found the origin of the issue. It seems that mirdeep2 doesn't like spaces in contigs names. Simplifying headers solved this problem. However I have encountered another problems (with processing miRNA raw seq data), but I'll contact with you about them later, in case my efforts would fail.

mschilli87 commented 2 years ago

It seems that mirdeep2 doesn't like spaces in contigs names. Simplifying headers solved this problem.

This is indeed a documented requirement. Note that there is a even script to check the genome file for validty that is also listed in the README. :wink: