lh3 / seqtk

Toolkit for processing sequences in FASTA/Q formats
MIT License
1.38k stars 308 forks source link

subseq silent fail on malformatted fq #212

Open cmsoulette opened 5 months ago

cmsoulette commented 5 months ago

seqtk will fail silently with malformatted fq.

Aligned BAM was converted with pysam, using .get_forward_sequence() and .get_forward_qualities() functions. The later returns array dtype instead of string and could lead to malformatted fq if not converted to string. Running seqtk subseq on malformatted file will not throw any error.

Example FQ entry:

@SRR.ABC.123 AGGGCAATGTACTTCGTTCA..... +SRR.ABC.123 array('B', [3, 3, 3, 3, 2,....])