vmaffei / dada2_to_picrust

Experimental pipeline to perform de novo PICRUSt on de-noised amplicon sequence variants (ASV)
19 stars 1 forks source link

utf8 codec Issue in both Ubuntu & Fedora #3

Closed josemseoane closed 7 years ago

josemseoane commented 7 years ago

Dear @vmaffei,

Thanks a lot for sharing this workflow, i think is briliant! I just wanted to let you know that I have experienced issues when running it both in Fedora and Ubuntu. Both machines have working Qiime instalations previously checked with the standard qiime protocols. No problems with the R section. However,when running the first alignment in section 2 line 1:

$ align_seqs.py -e 90 -p 0.1 -i ./genome_prediction/gg_13_5_study_db.fasta -o ./genome_prediction/gg_13_5_study_db.fasta.aligned

I always get this error:

Traceback (most recent call last): File "/usr/bin/align_seqs.py", line 211, in main() File "/usr/bin/align_seqs.py", line 194, in main log_path=log_path, failure_path=failure_path) File "/home/jmseoane/.local/lib/python2.7/site-packages/qiime/align_seqs.py", line 266, in call temp_dir=get_qiime_temp_dir()) File "/home/jmseoane/.local/lib/python2.7/site-packages/pynast/util.py", line 812, in pynast_seqs for seq, status in pynast_iterator: File "/home/jmseoane/.local/lib/python2.7/site-packages/pynast/util.py", line 599, in ipynast_seqs for seq_id, seq in candidate_sequences: File "/home/jmseoane/.local/lib/python2.7/site-packages/skbio/parse/sequences/fasta.py", line 136, in parse_fasta for rec in finder(infile): File "/home/jmseoane/.local/lib/python2.7/site-packages/skbio/parse/record_finder.py", line 145, in parser l = str(l.decode('utf-8')) File "/usr/lib64/python2.7/encodings/utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeDecodeError: 'utf8' codec can't decode byte 0x8b in position 1: invalid start byte

This issue is related with compression. I have fixed it by running the "cat "command in R on the uncompressed gg_13_5.fasta file instead of the original gg_13_5.fasta.gz:

system('cat gg_13_5.fasta >> gg_13_5_study_db.fasta')

All the rest worked just fine, thanks a lot!

Jose

vmaffei commented 7 years ago

Hey @josemseoane ! I'm glad you were able to figure out the issue...thanks so much for sharing. Yes, the pipeline requires gg_13_5.fasta not gg_13_5.fasta.gz. I'll double check the pipeline code to make sure the correct file is listed.

josemseoane commented 7 years ago

Thanks to you @vmaffei for sharing the pipeline!