Closed VBHerrenC closed 1 month ago
Hi @VBHerrenC, The V5 models use the new transformer architecture and there's still some work to do to tune the auto batch size calculation for a broader range of hardware.
We see that the auto batch size calculation has chosen 288 from
[2024-05-28 11:52:07.211] [info] cuda:0 using chunk size 12288, batch size 288
Could you try and manually set this slightly lower to --batchsize 256
or 224
to reduce memory consumption slightly.
Kind regards, Rich
Even when using batchsize 256, the GPU is barely doing anything. All GPU processing monitors show no activity. While the model is no longer running out of VRAM, it still is not making any progress on computing. In addition, the batch size has also dropped to about half as the batchsize of the previous model.
It is moving extremely slowly (about 2x as slow as the 4.3.0 model, predicting 12 hours instead of 6), but it is writing out to the BAM file and hasn't crashed yet like the previous run without the smaller batch size. Thanks!
We're expecting it to be 2x slower at the moment - there's much more optimisation to come to bring it closer to v4.3.0 sup speed. Closing as resolved but we'll continue to improve performance and stability.
Issue Report
Please describe the issue:
When running my DNA dataset with Dorado 0.7.0 with the dna_r10.4.1_e8.2_400bps_sup@v5.0.0 model, after about 8 minutes of basecalling a CUDA out of memory error is generated. This is odd because I was able to successfully basecall this dataset last week with Dorado 0.6.0 and the 0.4.3 model. Additionally, the basecalling appears to proceed normally when the same command is run with Dorado 0.7.0 but the 0.4.3 model. During troubleshooting with Dorado 0.7.0 and model 0.5.0, basecalling also completes with both --device cuda: all and --device cpu when --max-reads is set to 10. So, it seems to be an issue with the full dataset and the 0.5.0 model, perhaps an issue with how the batch sizes are being set? Any ideas would be appreciated!
Steps to reproduce the issue:
Please list any steps to reproduce the issue.
Run environment:
Dorado version: 0.7.0
Dorado command: ~/packages/dorado-0.7.0-linux-x64/bin/dorado basecaller ~/packages/dorado-0.7.0-linux-x64/models/dna_r10.4.1_e8.2_400bps_sup@v5.0.0 /home/pod5 --kit-name SQK-RBK114-24 --min-qscore 14 --trim all --device cuda:all > all_reads_sup_qFilter.bam
Operating system: WSL
Hardware (CPUs, Memory, GPUs): NVIDIA RTX A5000
Source data type (e.g., pod5 or fast5 - please note we always recommend converting to pod5 for optimal basecalling performance): pod5
Source data location (on device or networked drive - NFS, etc.): on device
Details about data (flow cell, kit, read lengths, number of reads, total dataset size in MB/GB/TB): FLO-MIN114, SQK-RBK114-24, N50 4.99 kb, 1.32 M reads, 66 GB
Dataset to reproduce, if applicable (small subset of data to share as a pod5 to reproduce the issue):
Logs