nanoporetech / dorado

Oxford Nanopore's Basecaller
https://nanoporetech.com/
Other
477 stars 59 forks source link

dorado demux trimming behaviour #661

Closed phpeters closed 6 months ago

phpeters commented 6 months ago

Dear developers,

Thanks for the lot of work you're putting into dorado to make it better and better! I thought I write a bug-report, but it turned out it's more of an misunderstanding-report.

What I want to do:

I thought that, since "--no-trim" is mentioned" in the documentation for "dorado demux", default behaviour would be to trim the barcodes and adapter/primer. But using "--no-classify" apparently leaves the reads as they are. Is this intended or about to be changed, or simply a missing "trimming works only with classification" in the documentation?

Thanks a ton and all the best! Philipp

  --no-classify         Skip barcode classification. Only demux based on existing classification in reads. Cannot be used with --kit-name or --sample-sheet. 
--no-trim               Skip barcode trimming. If option is not chosen, trimming is enabled.

Run environment:

tijyojwad commented 6 months ago

Hi @phpeters - your understanding is correct, that --no-classify will read the reads unchanged and only split the reads into barcode specific BAMs. Sorry for the confusion! We will clarify that in the docs in the next update.

This is not ideal, but you can rerun barcode classification on the dataset again (classifications will be deterministic for the same build). However we have a bug presently that dorado demux just adds another BC:Z tag instead of overwriting the old one. But since the classifications will be the same it won't make a difference. And this second round will also trim the barcodes.

Hope this helps!

phpeters commented 6 months ago

Hej @tijyojwad ,

Thanks a lot for clarifying this!

classifications will be deterministic for the same build

This answers exactly the question I forgot to ask. I did a rerun of the classification step and the results were the same and I hoped that this is deterministic / seeded.

I will proceed as suggested, thanks a lot! Philipp