s4hts / HTStream

A high throughput sequence read toolset using a streaming approach facilitated by Linux pipes
https://s4hts.github.io/HTStream/
Apache License 2.0
49 stars 9 forks source link

More UMI Stuff (hts_ExtractUMI and hts_SuperDeduper) #264

Closed bnjenner closed 1 week ago

bnjenner commented 1 month ago

So this PR has a number of changes.

  1. Additional parameters for EU (specifies separator for PE reads, optionally add UMI as a tag similar to primers) and SD (specifies column for UMI and tag option which looks in the comments of the read.)
  2. SD now correctly handles UMIs from paired end reads (would just double up on the UMI from R1 before).
  3. EU can now add UMIs in 4 ways: To R1, to R2, to both reads individually, or to both reads in tandem (i.e. ACGT+ACGT). I can clarify this a bit more with an example if needed.
  4. Added tests for the new functionality in EU and SD.
  5. Unset UMI delimiter in SD is now a space as to avoid an invalid character in the json files that other programs don't seem to like.

This one is definitely gonna need some review so let me know if any of you would like to schedule a zoom chat to discuss the changes.

bnjenner commented 1 week ago

Matt and I were talking about making this a release, any thoughts on that or things to change / bugs to fix before we make that happen?

joe-angell commented 1 week ago

I pinged david on the bug fix he was working on, that's the only thing i can think of atm.

On Thu, Oct 24, 2024 at 11:19 AM Bradley N. Jenner @.***> wrote:

Matt and I were talking about making this a release, any thoughts on that or things to change / bugs to fix before we make that happen?

— Reply to this email directly, view it on GitHub https://github.com/s4hts/HTStream/pull/264#issuecomment-2436057781, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB555SMIYMYCWNWCSDKYFNTZ5E22NAVCNFSM6AAAAABPB7YO5CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMZWGA2TONZYGE . You are receiving this because you commented.Message ID: @.***>

--

Joe Angell