dorado demux trimming behaviour

phpeters commented 6 months ago

Dear developers,

Thanks for the lot of work you're putting into dorado to make it better and better! I thought I write a bug-report, but it turned out it's more of an misunderstanding-report.

What I want to do:

do basecalling and leave reads untrimmed for later analyses (e.g. feeding to wf-transcriptome) dorado basecaller $model $inputpod5 --sample-sheet ... --kit-name ... --no-trim
demux dataset and trim for straight forward analysis dorado demux ... --no-classify $inputbam

I thought that, since "--no-trim" is mentioned" in the documentation for "dorado demux", default behaviour would be to trim the barcodes and adapter/primer. But using "--no-classify" apparently leaves the reads as they are. Is this intended or about to be changed, or simply a missing "trimming works only with classification" in the documentation?

Thanks a ton and all the best! Philipp

  --no-classify         Skip barcode classification. Only demux based on existing classification in reads. Cannot be used with --kit-name or --sample-sheet. 
--no-trim               Skip barcode trimming. If option is not chosen, trimming is enabled.

Run environment:

Dorado version: v0.5.3
Dorado command: dorado demux
Operating system: linux-x64
Source data type (e.g., pod5 or fast5 - please note we always recommend converting to pod5 for optimal basecalling performance): bam
Details about data (flow cell, kit, read lengths, number of reads, total dataset size in MB/GB/TB): PRO114_DNA_e8_2_400K:FLO-PRO114M:SQK-LSK114:400

tijyojwad commented 6 months ago

Hi @phpeters - your understanding is correct, that --no-classify will read the reads unchanged and only split the reads into barcode specific BAMs. Sorry for the confusion! We will clarify that in the docs in the next update.

This is not ideal, but you can rerun barcode classification on the dataset again (classifications will be deterministic for the same build). However we have a bug presently that dorado demux just adds another BC:Z tag instead of overwriting the old one. But since the classifications will be the same it won't make a difference. And this second round will also trim the barcodes.

Hope this helps!

phpeters commented 6 months ago

Hej @tijyojwad ,

Thanks a lot for clarifying this!

classifications will be deterministic for the same build

This answers exactly the question I forgot to ask. I did a rerun of the classification step and the results were the same and I hoped that this is deterministic / seeded.

I will proceed as suggested, thanks a lot! Philipp

nanoporetech / dorado

dorado demux trimming behaviour #661

Run environment: