Closed xieyy46 closed 1 month ago
Hi @xieyy46 - can you post the toml file you're using?
are you getting any classifications? and are the reads short? e.g. <150 bp?
Hi, the toml file I used is shown below,
and I did got normal barcode classifications results (and the reads are not short reads),
the only problem is that no matter how I modify min_soft_flank_threshold and min_hard_flank_threshold, the number of unclassified reads remains unchanged (about 12% of all reads).
can you try with v0.5.1? there was on parameter that was not taken from the config, perhaps that's what is affecting your results
I tried dorado v0.5.1, but again, min_soft_flank_threshold and min_hard_flank_threshold setting in the toml file did not work
Can you check how long the unclassified reads are?
I checked the read lengths, but did not the read lengths of unclassified reads special.
Can you conveniently test the issue with your own data?
Gotcha, yeah. You can debug in more detail by looking at the alignment for a specific read. e.g. if you pick a read id from the unclassified bam, you can run
$ echo <read-id> > reads.txt
$ dorado demux dorado/calls.bam --barcode-arrangement dorado/custom_barcodes.toml --barcode-sequences dorado/custom_barcodes.fa --output-dir dorado/demux -t 25 --read-ids reads.txt -vv
This will run dorado in trace mode and output detailed alignments and scoring, etc. Would be interesting to see what's happening
Hi @xieyy46 any updates on this?
Hi, I apologize for the delay. I'll check this later.
Hi, I just selected one read to run dorado demux in trace mode.
I set the Scoring options in the toml file as shown below: [scoring] min_soft_barcode_threshold = 0.2 min_hard_barcode_threshold = 0.2 min_soft_flank_threshold = 0.4 min_hard_flank_threshold = 0.4 min_barcode_score_dist = 0.05
The real flank score is 0.95238096, and scores for each barcode are: 0.583333 BC01, 0.5 BC03, 0.416667 BC02, 0.375 BC04. Both flank score and barcode scores is above the threshold, and the scores difference between best barcode and second-best barcode is above 0.05. So this read should be classified as BC01. However, actually this read was unclassified to any barcode by dorado.
So scoring options setting in the toml file were not received by dorado?
By the way, I noticed another issue that the candidate barcode sequence extracted before the mask1_rear shift one base towards the rear sequence. May be an error in the dorado demux code?
A instance is shown below:
the candidate barcode sequence should be "AAAAAAGTTGTCGGTGTCTTTGTG",
however, dorado extracted "AAAAAGTTGTCGGTGTCTTTGTGC",
one base shift?
Ah thanks for posting the trace details, now I can see what's going on. There are 2 issues here (both would need to be addressed in another build of dorado) -
min(config value, 0.1)
.
Hi dorado team, Thank you for your excellent work! I have tried to demultiplex my reads using custom barcodes, by specifying the custom_barcodes.toml and custom_barcodes.fa. However, I found that no matter how I modify min_soft_flank_threshold and min_hard_flank_threshold (I tried 0.05, 0.1, 0.2, 0.3, 0.5), the number of unclassified reads remains unchanged.
the command line: dorado demux dorado/calls.bam --barcode-arrangement dorado/custom_barcodes.toml --barcode-sequences dorado/custom_barcodes.fa --output-dir dorado/demux -t 25