smirarab / sepp-refs

GNU General Public License v3.0
5 stars 2 forks source link

run_seqtools.py crashed: illegal characeters in sequence at line 1 #2

Closed zhanxw closed 4 years ago

zhanxw commented 4 years ago

I am following your instruction (https://github.com/smirarab/sepp-refs/tree/master/silva) to prepare the SILVA reference tree, but the run_seqtools.py (obtained from https://github.com/smirarab/pasta) gave the following error:

run_seqtools.py -masksites 1977 -infile 99_otus_aligned.fasta -outfile 99_otus_aligned_masked1977.fasta Traceback (most recent call last): File "../../pasta/run_seqtools.py", line 36, in alg.read_file_object(args.infile,args.informat) File "/work/archive/PCDC/PCDC_Core/xzhan9/jun.chen/data/database/pasta/pasta/alignment.py", line 1335, in read_file_object for name, seq in read_func(file_obj): File "/work/archive/PCDC/PCDC_Core/xzhan9/jun.chen/data/database/pasta/pasta/alignment.py", line 75, in read_fasta raise Exception("Error: illegal characeters in sequence at line %d" % line_number) Exception: Error: illegal characeters in sequence at line 1

Is the error related to the "." in FASTA file?

smirarab commented 4 years ago

Yes, I have done the analyses on SILVA dataset, and I used the following:

sed -e "s/./-/g" 99_alignment.fna > 99_alignment_nodots.fasta

run_seqtools.py -masksites 2125 -infile 99_alignment_nodots.fasta -outfile 99_alignment_nodots.masked2125.fasta cat 99_alignment_nodots.masked2125.fasta |awk '/>/ {a=gensub("-",".","g"); print a} /^ *[^>]/{print }' > 99_alignment_nodots.masked2125_corn.fasta

On Tue, Jan 21, 2020 at 11:21 AM zhanxw notifications@github.com wrote:

I am following your instruction ( https://github.com/smirarab/sepp-refs/tree/master/silva) to prepare the SILVA reference tree, but the run_seqtools.py (obtained from https://github.com/smirarab/pasta) gave the following error:

run_seqtools.py -masksites 1977 -infile 99_otus_aligned.fasta -outfile 99_otus_aligned_masked1977.fasta Traceback (most recent call last): File "../../pasta/run_seqtools.py", line 36, in alg.read_file_object(args.infile,args.informat) File "/work/archive/PCDC/PCDC_Core/xzhan9/jun.chen/data/database/pasta/pasta/alignment.py", line 1335, in read_file_object for name, seq in read_func(file_obj): File "/work/archive/PCDC/PCDC_Core/xzhan9/jun.chen/data/database/pasta/pasta/alignment.py", line 75, in read_fasta raise Exception("Error: illegal characeters in sequence at line %d" % line_number) Exception: Error: illegal characeters in sequence at line 1

Is the error related to the "." in FASTA file?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/smirarab/sepp-refs/issues/2?email_source=notifications&email_token=AAGJXOGV7XSOYIJSB7BQ3FTQ65DKFA5CNFSM4KJYW77KYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IHXJEAA, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGJXOEJ3RZJZTD5QC2MQCTQ65DKFANCNFSM4KJYW77A .

-- Siavash Mirarab

smirarab commented 4 years ago

For better readability:

sed -e "s/\./-/g" 99_alignment.fna > 99_alignment_nodots.fasta

run_seqtools.py -masksites 2125 -infile 99_alignment_nodots.fasta
-outfile 99_alignment_nodots.masked2125.fasta

cat 99_alignment_nodots.masked2125.fasta |awk '/>/
{a=gensub("-",".","g"); print a} /^ *[^>]/{print }' >
99_alignment_nodots.masked2125_corn.fasta
zhanxw commented 4 years ago

Thanks!