rajewsky-lab / mirdeep2

Discovering known and novel miRNAs from small RNA sequencing data
GNU General Public License v3.0
135 stars 49 forks source link

No whitespace in my .fa, yet I get a whitespace error in miRDeep2.pl #86

Closed timdelory closed 2 years ago

timdelory commented 2 years ago

I am running miRDeep2.pl on a species for which I have a FASTA of known mature miRNAs. However, I am unable to run this script successfully, as I keep receiving a whitespace error: my_mature_mirna.fa "has not allowed whitespaces in its first identifier" There are no trailing whitespaces in this fasta. I removed any potential spaces using sed -i 's/ //g', just to be sure (it did not change the file size). Here are the first few headers, perhaps something about the header names are causing an error? Thank you for your time!

GL985793_7038 TGAGATCATTGTGAAAGCTGATT GL986159_20383 TGGAATGTAAAGAAGTATGGAG GL985803_8126 ATATTGTCCTGTCACAGCAGTAC GL985734_3508 TGAGATTCAACTCCTCCAACTT GL985752_4957 TGAGTATTAATTCAGGTACTGGT

mschilli87 commented 2 years ago

Can you pipe your FASTA file through od -c to check for non-printable characters? What kind of linebreaks does the file use (\n/\r\n)?

timdelory commented 2 years ago

Here is the head and tail of this your pipe request: [u60243]$ cat my_mature.fa | od -c | head 0000000 > G L 9 8 5 7 9 3 7 0 3 8 \r \n 0000020 T G A G A T C A T T G T G A A A 0000040 G C T G A T T \r \n > G L 9 8 6 1 0000060 5 9 2 0 3 8 3 \r \n T G G A A T 0000100 G T A A A G A A G T A T G G A G 0000120 \r \n > G L 9 8 5 8 0 3 8 1 2 6 0000140 \r \n A T A T T G T C C T G T C A 0000160 C A G C A G T A C \r \n > G L 9 8 0000200 5 7 3 4 3 5 0 8 \r \n T G A G A 0000220 T T C A A C T C C T C C A A C T [u60243]$ cat my_mature.fa | od -c | tail 0007440 A T A T C C G T T G A T C G T A 0007460 G T \r \n > G L 9 8 6 7 0 2 2 5 0007500 1 1 3 \r \n A G G T A A T C G T C 0007520 G G T G T T T T C G \r \n > G L 9 0007540 8 5 8 3 8 1 0 2 9 1 \r \n T C T 0007560 C A C T A T C T T G T C T T T C 0007600 A T C \r \n > G L 9 8 5 9 4 1 _ 1 0007620 4 2 5 6 \r \n C T T T G G T A A T 0007640 A C A G C T C T A T G A \r \n

timdelory commented 2 years ago

The problem must be that I have \r\n breaks instead of \n breaks. I will follow up after re-formating.

timdelory commented 2 years ago

Thank you! that was the issue.