Closed shair89 closed 1 month ago
Hi @shair89 - we are working on an urgent fix to the issue and will release a patch within the next day or so. Thank for your patience!
@shair89 aside from v0.6.1 and earlier, we've also noticed a difference between using dorado and the basecall server! Interesting that you've seen the same.
@tijyojwad if we've already barcode classified with no-trim on basecaller, how should we "re-barcode classify" on the demux step? I'm guessing there's someway to override the old barcode call.
@shair89 I forgot to update this thread! v0.6.2 was released with the patch fix for the low classification rate. Dorado is now at 0.7.0 which also contains the fix.
@billytcl - unfortunately for the RBK signal the --no-trim
didn't apply. So you'll need to re-basecall to get the RBK improvements.
When basecalling with Dorado 0.6.1 it is not successfully assigning barcode groups/demultiplexing using kit SQK-RBK114-24.
We have tested basecalling the same small POD5 file and have varying results:
With Dorado 0.6.1: Barcode 19 = 129 reads Barcode 20 = 188 reads Unclassified = 3124 reads
Dorado 0.5.3: Barcode 19 = 838 reads Barcode 20 = 1043 reads Unclassified = 1557
basecall_server-7.3.9 (MinKnow) Barcode19 = 907 Barcode 20 = 1116 Unclassified = 936
Small number of reads were assigned to other barcodes that weren't actually used in the experiment which varied between the tests.
We have tried the barcode classification using the basecaller command and the demux command separately (using --no-trim during basecalling) and obtained similar results.
Steps to reproduce the issue:
Please list any steps to reproduce the issue.
Run environment:
Dorado version: 0.6.1+79b5da5
Dorado command:
dorado basecaller --kit-name SQK-RBK114-24 sup,5mCG_5hmCG ./pod5 > basecalled.bam
dorado demux --output-dir ./demuxed/ --no-classify basecalled.bam
Operating system: Ubuntu 22.04.4 LTS 64bit
Hardware (CPUs, Memory, GPUs): Intel® Xeon(R) W-2255 CPU @ 3.70GHz × 20, 128GB RAM, NVIDIA QUADRO RTX 6000
Source data type (e.g., pod5 or fast5 - please note we always recommend converting to pod5 for optimal basecalling performance): pod5
Source data location (on device or networked drive - NFS, etc.): device
Details about data (flow cell, kit, read lengths, number of reads, total dataset size in MB/GB/TB): R10, RBK114-24, 3446 reads, 1.7GB pod5