nanoporetech / pinfish

Tools to annotate genomes using long read transcriptomics data
Other
44 stars 13 forks source link

pinfish_polish fails when run on BAM generated from fasta file #25

Closed tleonardi closed 4 years ago

tleonardi commented 4 years ago

Hi, when the BAM file is generated from a fasta rather than fastq file, pinfish_polish fails due to racon exiting with code 134. Manually running the racon command gives the following error:

terminate called after throwing an instance of 'std::invalid_argument'
  what():  [bioparser::FastqParser] error: invalid file format!

This is due to the fact that the reads.fq and reference.fq generated by pinfish_polish from the BAM file have strings of spaces (\x20) as phred score lines. The underlying cause for this is that the BAM reader in biogo/hts returns as seq.Qual a byte array filled with 255, which gets encoded by biogo/seqio as spaces (see here).

However, racon does support fasta file. I've addressed the issue modifying pinfish_polish so that it generates fasta files rather then fastq when needed. I'll open a pull request shortly.