lh3 / seqtk

Toolkit for processing sequences in FASTA/Q formats
MIT License
1.38k stars 308 forks source link

seqtk subseq inserting line break after 1024 characters #143

Closed nqp3 closed 5 years ago

nqp3 commented 5 years ago

I'm attempting to use seqtk subseq on some long reads, but I was getting a lot of 6+ line entries in my .fastq:

readID sequence line 1 sequence line 2 + Qscores line 1 Qscores line 2

After some messing around, I realized it was splitting the reads and Qscores into separate lines after 1024 characters.

I (obviously) need my .fastq to keep the standard 4 line format

Is there an option or fix for this that I am missing? Thanks!

shenwei356 commented 5 years ago

Piped to seqtk seq?

seqtk subseq | seqtk seq 
nqp3 commented 5 years ago

RTFM

$ seqtk subseq Usage: seqtk subseq [options] | Options: -t TAB delimited output -l INT sequence line length [1024]

Looks like you can change the length of the line with -l I'll give this a shot.

seqtk seq also seems to work

seqtk subseq in.fastq list.lis > bad.fastq seqtk seq bad.fastq > good.fastq good.fastq has the right number of lines will work in one of the solutions

nqp3 commented 5 years ago

how would that pipe work? seqtk subseq in.fastq list.list | seqtk seq > out.fastq isn't working, 0 lines in out.fastq seqtk seq <(seqtk subseq in.fastq list.list) > out.fastq works though

shenwei356 commented 5 years ago
 seqtk subseq in.fastq list.list | seqtk seq - > out.fastq