lh3 / seqtk

Toolkit for processing sequences in FASTA/Q formats
MIT License
1.38k stars 308 forks source link

Not sorting /1 and /2 reads properly. #170

Closed jallmer closed 3 years ago

jallmer commented 3 years ago

I just cloned and made seqtk.

I wanted to use it to split a mixed fastq file into two for each read pair. According to the seqtk seq -h I could use -1 and -2 for that: seqtk seq -1 toAssemble_mixedPairs.fastq > toAssemble_1.fastq seqtk seq -2 toAssemble_mixedPairs.fastq > toAssemble_2.fastq

I used 'grep /1 toAssemble_2' to confirm there is no /1 in the file. That seems fine.

The toAssemble_1 file, however, contains /2 entries. Counting them reveals 70mio /1 and 280mio /2 reads (~80GB mixed file).

Any ideas?

lh3 commented 3 years ago

The input fastq has to be interleaved.