Split output based on native barcodes?

rrwick / Porechop

adapter trimmer for Oxford Nanopore reads

GNU General Public License v3.0

331 stars 124 forks source link

Split output based on native barcodes? #2

Closed nextgenusfs closed 7 years ago

nextgenusfs commented 7 years ago

Hi Ryan, This looks like a great tool. Would it be possible to modify this so that the native barcodes can be split into individual output files? e.g. NB001.trim.fasta, NB002.trim.fasta, unassigned.fa etc. I'm looking for a faster way than metrichor to demulitplex the reads. I could write something but seems like you have a very fast implementation here and maybe wouldn't be that hard to provide an option to split outputs? Thanks, Jon

nextgenusfs commented 7 years ago

I saw that you also had a barcode-binner repository. I will try that one, but ideally would be nice if these two projects were combined. So you could bin and trim in one step.

rrwick commented 7 years ago

Hi Jon,

Yes, I do have a barcode binner, but it's a bit rough (as you can see by the lack of a README). And yes, I agree that adding that binning functionality into Porechop would be a good idea!

That being said, Clive Brown's recent tech update revealed that a lot about basecalling is going to change soon. Metrichor basecalling is going away in a few days! I'm not sure if it will still be available for barcode binning or not. And the local MinKNOW basecalling is apparently much improved. So I guess I'm waiting to try these things out and for the dust to settle. If it turns out that MinKNOW's local basecalling does a great job with barcode binning, then it's not a big priority for me to add that feature to Porechop. On the other hand, if local binning MinKNOW leaves much to be desired, then I'll definitely consider prioritising this feature!

Thanks for the idea, and I'll leave this issue open for now.

Ryan

nextgenusfs commented 7 years ago

HI Ryan, Yes I appreciate hesitancy due to the speed at which nanopore seems to change formats, speeds, kits, etc. Although there is currently no way to demultiplex reads from a barcoded run in MinKNOW, so yes they will probably implement that feature in the coming releases, but right there aren't many other options. Besides, I still think having your software with the ability to handle noisy reads and demultiplex samples will be useful for several different datasets and is likely going to be faster given that you write some very efficient and fast code. One of these days I need to take the plunge and learn C and the seqan library.

rrwick commented 7 years ago

I had a few requests for this, so I've implemented it in the new release of Porechop. Check it out, grab the current version and give it a try!

nextgenusfs commented 7 years ago

Great, thanks Ryan. Will give it a go.