lh3 / seqtk

Toolkit for processing sequences in FASTA/Q formats
MIT License
1.37k stars 308 forks source link

Trimming paired end reads with seqtk #92

Closed divy-kangeyan closed 6 years ago

divy-kangeyan commented 7 years ago

I used seqtk trimfq to trim paired end fastq files separately. After trimming when I try to align it I get an error saying Paired read names do not match

Is there a way to trim paired end reads simultaneously?

tseemann commented 6 years ago

@Divyagash Trimming can result in one of a pair being completely removed. So you may not end up with pairs at the end. You need to use dropse to remove those.

Here's one workflow you can use:

seqtk mergepe R1 R2 | seqtk trimfq - | seqtk dropse > R12.trim
seqtk seq -1 R12.trim | gzip > R1.trim.gz
seqtk seq -2 R12.trim | gzip > R2.trim.gz

If you are using Bash or Zsh you can do it all on one line:

seqtk mergepe R1 R2 
| seqtk trimfq - 
| seqtk dropse \
| tee >(seqtk seq -1  | gzip > R1.trim.gz)
      | seqtk seq -2 | gzip > R2.trim.gz
tseemann commented 6 years ago

@Divyagash can you close this issue please?