sanger-pathogens / iva

de novo virus assembler of Illumina paired reads
http://sanger-pathogens.github.io/iva/
Other
53 stars 18 forks source link

Blank sequence causes TypeError #86

Open donkirkby opened 5 years ago

donkirkby commented 5 years ago

Thanks for publishing this assembly tool, it's been very useful for us. I wanted to let you know about a problem we ran into with some bad input data that was hard to track down.

If one of the input reads has a blank sequence, then the pysam reader reads it in as None instead of a blank string. Then, I see the following error:

$ iva -f 2140A-HCV_S17_L001_R1_001.fastq -r 2140A-HCV_S17_L001_R2_001.fastq scratch
Traceback (most recent call last):
  File "/mnt/data/don/git/MiCall/venv_micall/bin/iva", line 286, in <module>
    assembly.read_pair_extend(reads_prefix, 'iteration')
  File "/mnt/data/don/git/MiCall/venv_micall/lib/python3.6/site-packages/iva/assembly.py", line 423, in read_pair_extend
    self._read_pair_extension_iterations(current_reads_prefix, out_prefix + '.' + str(i))
  File "/mnt/data/don/git/MiCall/venv_micall/lib/python3.6/site-packages/iva/assembly.py", line 358, in _read_pair_extension_iterations
    bases_added = self._extend_with_reads(reads_prefix, out_prefix + '.1', no_map_contigs)
  File "/mnt/data/don/git/MiCall/venv_micall/lib/python3.6/site-packages/iva/assembly.py", line 340, in _extend_with_reads
    bases_added = self._extend_contigs_with_bam(bam, out_prefix=reads_prefix)
  File "/mnt/data/don/git/MiCall/venv_micall/lib/python3.6/site-packages/iva/assembly.py", line 183, in _extend_contigs_with_bam
    print(mapping.sam_to_fasta(sam), file=fa_out1)
  File "/mnt/data/don/git/MiCall/venv_micall/lib/python3.6/site-packages/pyfastaq/sequences.py", line 420, in __str__
    return '>' + self.id + '\n' + '\n'.join(self.seq[i:i+Fasta.line_length] for i in range(0, len(self), Fasta.line_length))
  File "/mnt/data/don/git/MiCall/venv_micall/lib/python3.6/site-packages/pyfastaq/sequences.py", line 173, in __len__
    return len(self.seq)
TypeError: object of type 'NoneType' has no len()
$

To reproduce this error, unzip the attached file, and try it out. This file is a minimal example with about 30 Hepatitis C reads that will assemble successfully if you remove the last read. 2140A-HCV_S17.zip

It would be nice if mapping.sam_to_fasta() either checked for None and wrote out a blank sequence, or if the earlier code checked for blank sequences and failed with a more helpful error message.

Thanks again.