lh3 / seqtk

Toolkit for processing sequences in FASTA/Q formats
MIT License
1.38k stars 308 forks source link

`sample` command silently produces incorrect empty reads #213

Open rickymagner opened 5 months ago

rickymagner commented 5 months ago

Hi, I have a fastq with some empty reads like so:

@name

+

@other
AAA
+
999

Running seqtk sample bad.fastq 0.9 on this produces:

>name

@other
AAA
+
999

which is no longer valid fastq, as the @name read has changed format. This breaks lots of downstream tools. Is it possible to keep it in the original format? I'd like to preserve these if sampled because there might be a read pair which is nonempty.

rickymagner commented 5 months ago

Just saw this issue here but can confirm this still happens on the latest version (r130).