s4hts / HTStream

A high throughput sequence read toolset using a streaming approach facilitated by Linux pipes
https://s4hts.github.io/HTStream/
Apache License 2.0
49 stars 9 forks source link

New Tool: hts_ExtractUMI #257

Closed bnjenner closed 4 months ago

bnjenner commented 4 months ago

Hey there,

This PR adds a tool called hts_ExtractUMI. As the name implies, it is meant to sort of be a drop in replacement for umi_tools extract with some different functionality so that it fits the HTStream philosophy, mainly streaming, and some single cell pipelines a bit better. It is based on a Python script Matt wrote, so I had a good idea of where to start. I tried my best to preserve the style and organization of HTStream and also wrote a bunch of test functions. As of now, all the tests pass for all the functions and the output from other programs is unaffected, so I am pretty sure nothing got messed up even though I added some functions to read.h and utils.h.

I imagine you guys will have some questions, comments, and request some changes (particularly to the name, lol), but I figured now is a good time to get some other eyes on the code and bring attention to this PR. I don't want to dump a bunch of code into the repo all at once.