immcantation / presto

pRESTO is part of the Immcantation analysis framework for Adaptive Immune Receptor Repertoire sequencing (AIRR-seq). pRESTO is a bioinformatics toolkit for processing high-throughput lymphocyte receptor sequencing data.
https://presto.readthedocs.io
GNU Affero General Public License v3.0
0 stars 0 forks source link

Error with Greiff2014 V primers example #40

Closed ssnn-airr closed 8 years ago

ssnn-airr commented 8 years ago

Original report by Scott Christley (Bitbucket: [Scott Christley](https://bitbucket.org/Scott Christley), ).


This seemed to be working fine before but recently (maybe new Biopython) this example fails when masking the V primers.

#!python

  File "/usr/local/bin/MaskPrimers.py", line 644, in <module>
    maskPrimers(**args_dict)
  File "/usr/local/bin/MaskPrimers.py", line 475, in maskPrimers
    primers = readPrimerFile(primer_file)
  File "/usr/local/lib/python3.4/dist-packages/presto/IO.py", line 41, in readPrimerFile
    for p in primer_iter}
  File "/usr/local/lib/python3.4/dist-packages/presto/IO.py", line 40, in <dictcomp>
    primers = {p.description: str(p.seq).upper()
  File "/usr/local/lib/python3.4/dist-packages/Bio/SeqIO/__init__.py", line 591, in parse
    for r in i:
  File "/usr/local/lib/python3.4/dist-packages/Bio/SeqIO/FastaIO.py", line 124, in FastaIterator
    for title, sequence in SimpleFastaParser(handle):
  File "/usr/local/lib/python3.4/dist-packages/Bio/SeqIO/FastaIO.py", line 45, in SimpleFastaParser
    line = handle.readline()
  File "/usr/lib/python3.4/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 582: ordinal not in range(128)

The problem seems to be with Greiff2014_VPrimers.fasta, there are some extra non-printable characters at the end of the file. If delete those lines at the end then everything runs fine.

ssnn-airr commented 8 years ago

Original comment by Jason Vander Heiden (Bitbucket: javh, GitHub: javh).


Thanks Scott. Nice catch. Fixed in 8ecee8d.