Closed roryk closed 8 years ago
I figured it was quicker to just do the counting without demultiplexing if possible, there are already tools for demultiplexing.
If you think it would be beneficial to have a demultiplexing subcommand, it wouldn't hurt to have it there.
It should be noted though that at the moment I'm handling demultiplexing by exact matching. With data I've handled, this "only" throws away 3% of the reads. Meanwhile, other demultiplexing tools do it in a way that allows some errors.
The file of allowed barcodes is already implemented, this is the --cb_filter
option, I use it a lot for e.g. MARS-Seq and CEL-Seq data.
Thanks Valentine, what do you think about having the cb_filter
option be a subcommand to decouple it from tagcount? So you can do like `umis fastqtransform foobar | umis cb_filter --barcode-list barcodes - | do streaming alignment' with a cleaned file or what not.
That's a great idea! This will avoid having to put a bunch if checks in every iteration of the tallying loop.
Hi Valentine,
What do you think about adding an option to demultiplex the barcodes into separate files, named by the barcode? We could also pass along a file of allowed barcodes to match and filter out non-matching barcodes as we go. I don't want to muck up your repo with functionality you weren't intending though.