lh3 / seqtk

Toolkit for processing sequences in FASTA/Q formats
MIT License
1.37k stars 308 forks source link

added -B/-E to trimfq for keeping first/last INT bp and also -s for shortest read #91

Open ndaniel opened 7 years ago

ndaniel commented 7 years ago

This is basically a resurrected issue https://github.com/lh3/seqtk/pull/38 which was brought up to date to the latest release of seqtk such that they do not interfere with the original command line options of seqtk anymore.

More precisely this adds for trimfq the following:

-s INT      trimming by -b/-e/-B/-E shall not produce reads shorter then INT bp
-B INT      keep first INT bp from left (non-zero to disable -q/-e/-E)
-E INT      keep last INT bp from right (non-zero to disable -q/-b/-B)

This allows a more precise control of how trimming is done. This kind of trimming is used heavily in FusionCatcher (by using a forked seqtk instead of the original seqtk). Here https://github.com/lh3/seqtk/pull/38 was mentioned that this kind of trimming is rare but actually in ractice is used a lot. Regarding popularity of such trimming, one has that, for example:

yhoogstrate commented 6 years ago

Any news here @lh3 @ndaniel ?