lorrainea / MARS

MARS: improving Multiple circular sequence Alignment using Refined Sequences
GNU General Public License v3.0
27 stars 9 forks source link

Error: unexpected character in imput file, but no illegal characters are in the fasta file. #2

Closed ejurga closed 6 years ago

ejurga commented 6 years ago

I am trying to run MARs using two circular mitochondrial genomes. I have verified that each sequence contains only the characters ATGC. Nevertheless, I get the following error:

Error: input file /path/to/file contains an un!xpected character

The fasta files and sequences appear to be formatted properly, and no discernible unexpected characters were found.

Is this a possible bug, or is there something that I am missing?

The command I ran is as follows, running the latest version as of Apr26/2018.

MARS-master$ ./mars -i ~/path/to/input.fasta -o ~/path/to/output -a DNA

I have attached the file containing the sequences I am trying to align input.fasta

Thankyou.

solonas13 commented 6 years ago

Can you please try to run the attached? input.clean.fasta.zip

I just copied and pasted the sequences in vim on my Linux machine and it works fine. I suspect that this is an encoding thing of your editor. May I ask what is the OS and editor you are using?

Best, Solon.

ejurga commented 6 years ago

Solon,

Your files did solve my particular error. This is interesting, since I also use Linux (Ubuntu 16.04.3 LTS) and vim as my text editor (Although they are being run out of a virtual machine).

Unfortunately, your files still do not go to completion. I get the following upon running MARS with your attached files:

Reading the (Multi)FASTA input file: ../../Alignment_Files/solon_files.fasta Computing cyclic edit distance for all sequence pairs Creating the guide tree Starting progressive alignment Killed

The program exits and leaves me with no output file. As before, I am not entirely sure why this is.

Thanks,

Emil.

lorrainea commented 6 years ago

Hi Emil

This is just a result of running out of memory due to the length of the sequences. I would suggest using hCED rather than MARS as MARS is designed for multiple sequences and seeing as you only have two, hCED would be more beneficial as works for sequence pairs. (MARS is built on hCED and so the inital step of MARS is hCED anyway)

You can install this from here: https://github.com/lorrainea/hCED and use the command ./hCED -i solon_files.fasta -o OUT -l 50 -R 1

Lorraine

DMatsliyah commented 1 year ago

Writing here in case this helps anyone who needs it.

I had the same issue, the problem was encoding - the fasta should not only be in a UTF-8 format but also in Linux style (LF) line ending. converting it on Notepad++ solved it for me.