Open alexyfyf opened 1 week ago
Hi @alexyfyf,
Yes, this is expected behaviour. demux
does not perform trimming unless it is also classifying. If you want to have untrimmed reads after basecalling, you will need to run the basecaller without --kit-name
, and then classify during demux:
dorado basecaller sup $pod5 --no-trim > ${dir}/basecalled_reads.bam
dorado demux --threads 16 --output-dir ${dir}/demux --kit-name SQK-NBD114-24 ${dir}/basecalled_reads.bam
@malton-ont thank you for your reply. I'm still a bit confused. So does your basecall command generate the same file as mine (maybe only differ in some tags specifying barcode information, and read sequences should be identical)? One more question, from what I searched from the your github issues, seems classifying in demux usually generates more usable reads, is that what you observe as well?
Cheers,
@alexyfyf,
Yes, the only difference after the basecaller
command would be the BC
tags being present or not, and the RG
tags will be more detailed and specific if barcoding is performed during basecalling (when barcoding with basecalling we can create read groups for the individual barcodes, while demux
does not update the read tags).
There should be no real difference between the two methods regarding the sequences or other tags.
Hi @alexyfyf,
Yes, this is expected behaviour.
demux
does not perform trimming unless it is also classifying. If you want to have untrimmed reads after basecalling, you will need to run the basecaller without--kit-name
, and then classify during demux:dorado basecaller sup $pod5 --no-trim > ${dir}/basecalled_reads.bam dorado demux --threads 16 --output-dir ${dir}/demux --kit-name SQK-NBD114-24 ${dir}/basecalled_reads.bam
In the manual, in the section Barcode Classification > Classifying existing datasets it says: "As with the in-line mode,--no-trim
and --barcode-both-ends
are also available as additional options." Does it mean that dorado demux perform trimming of barcodes, adapters and primers by default? I am confused with your comment: demux
does not perform trimming unless it is also classifying
Issue Report
Please describe the issue:
Dorado basecall identified the barcode without trimming, but subsequent demux also did not trimm reads. Is this the expected behaviour? Is it possible to keep all sequence in basecalling, but remove barcode and adapter in demux?
Steps to reproduce the issue:
I have run dorado 0.7.0 to basecall and demux pod5 files. I used the following command for basecalling
The bam file contains the basecalled reads with Nanopore adapter and barcode information. The I ran demux
This time I did not ask for
--no-trim
and I assume barcode and primers will be removed, but the reads are exactly the same as in the previous bam, essentially demux just split them into separate files.Run environment:
Logs
Basecall logs