lh3 / seqtk

Toolkit for processing sequences in FASTA/Q formats
MIT License
1.37k stars 308 forks source link

added -B/-E to trimfq for keeping first/last INT bp and also -s for shortest read #90

Closed ndaniel closed 7 years ago

ndaniel commented 7 years ago

This is basically a resurrected issue https://github.com/lh3/seqtk/pull/38 which was brought up to date to the latest release of seqtk such that they do not interfere with the original command line options of seqtk anymore.

More precisely this adds for trimfq the following:

-s INT      trimming by -b/-e/-B/-E shall not produce reads shorter then INT bp
-B INT      keep first INT bp from left (non-zero to disable -q/-e/-E)
-E INT      keep last INT bp from right (non-zero to disable -q/-b/-B)

This allows a more precise control of how trimming is done. This kind of trimming is used heavily in FusionCatcher (by using a forked seqtk instead of the original seqtk). Here https://github.com/lh3/seqtk/pull/38 was mentioned that this kind of trimming is rare but actually inp ractice is used a lot. Regarding popularity of such trimming, one has that, for example:

lh3 commented 7 years ago

There are too many changes.

lh3 commented 7 years ago

Please make changes on the latest master branch. Thanks.

ndaniel commented 7 years ago

@lh3 Very good point! It is done here: https://github.com/lh3/seqtk/pull/91