lh3 / seqtk

Toolkit for processing sequences in FASTA/Q formats
MIT License
1.38k stars 308 forks source link

seqtk subseq error id with whitespace #141

Closed fengyq closed 5 years ago

fengyq commented 5 years ago

Hi Heng,

When the id have white space in the name line of a fastq file, the program will only extranct the first base of the sequence.

fastq file

@M01326:74:000000000-A6B33:1:1101:18968:2222 1:N:0:0 GGACTCCT + CCBCCFFF @M01326:74:000000000-A6B33:1:1101:14853:2501 1:N:0:0 GGACTCCT + CCCCCFFF @M01326:74:000000000-A6B33:1:1101:10560:2991 1:N:0:0 GGACTCCT

name.list

M01326:74:000000000-A6B33:1:1101:18968:2222 1:N:0:0 M01326:74:000000000-A6B33:1:1101:14853:2501 1:N:0:0 M01326:74:000000000-A6B33:1:1101:10560:2991 1:N:0:0

fengyq commented 5 years ago
seqtk subseq  t1.fq  t1.lst

The output will like this

@M01326:74:000000000-A6B33:1:1101:18968:2222:1-1 1:N:0:0 G + C @M01326:74:000000000-A6B33:1:1101:14853:2501:1-1 1:N:0:0 G + C >M01326:74:000000000-A6B33:1:1101:10560:2991:1-1 1:N:0:0 G

lh3 commented 5 years ago

Don't put 1:N:0:0 in the list. It is not part of FSATQ name.