Closed samuelmontgomery closed 4 months ago
Hi @samuelmontgomery - the 2 chunk sizes are for dorado to handle shorter reads in a more performant manner, since running shorter reads with the big chunks led to under utilization of the GPU.
It would help to understand the read distribution in your dataset to see what's going on. Could you share that?
Hi @tijyojwad - the subset of reads I am running has an N50 of 2.75kB, but has a lot of shorter reads (as it's just the first ~10,000 reads from the run)
I find with the default batch size selected for the GPU (even with 0.5.3), the GPU says it is running with a 100% load, but doesn't seem to actually utilise the capacity of the GPU and stays running at ambient temperatures (~30C) On 0.5.3 when selecting a smaller batch size, it also runs at 100% load but actually seems to be working, with the temperature running at ~65C during basecalling
I am not seeing drastic reduction in speed when compare to 0.5.3 and 0.4.3 using my 4xL40S which has a similar architecture to 4090.
But then I am only running 4.3.0 5mCG_5hCG@v1 and no 6mA.
Hi @samuelmontgomery - sorry for the late reply! I was wondering if you could do this experiment without mod basecalling? Trying to see if it's a canonical basecall speed issue or mod calling
thanks @tijyojwad
I have repeated the set of reads without modification (just sup@v4.3.0) with both 0.6.0 and 0.5.3 with --batchsize set to 704, with auto batch size, and with 5mC_5hmC and 6mA modifications run individually and together
0.6.0 ranges from between ~2x slower to ~87x slower than 0.5.3
0.5.3 (batch size 704)
C:\dorado_0.5.3\bin\dorado.exe basecaller sup --no-trim --kit-name SQK-RBK114-24 --recursive --batchsize 704 .\pod5_pass > .\batch_0.5.3_sup.bam [2024-04-19 11:10:08.141] [info] > Simplex reads basecalled: 8940 [2024-04-19 11:10:08.141] [info] > Simplex reads filtered: 9 [2024-04-19 11:10:08.141] [info] > Basecalled @ Samples/s: 7.909272e+06 [2024-04-19 11:10:08.141] [info] > 8940 reads demuxed @ classifications/s: 4.958128e+02 [2024-04-19 11:10:08.180] [info] > Finished
0.6.0 (batch size 704)
C:\dorado_0.6.0\bin\dorado.exe basecaller sup --no-trim --kit-name SQK-RBK114-24 --recursive --batchsize 704 .\pod5_pass > .\batch_0.6_sup.bam [2024-04-19 11:14:08.482] [info] > Simplex reads basecalled: 8926 [2024-04-19 11:14:08.482] [info] > Simplex reads filtered: 23 [2024-04-19 11:14:08.482] [info] > Basecalled @ Samples/s: 3.970008e+06 [2024-04-19 11:14:08.482] [info] > 9100 reads demuxed @ classifications/s: 3.073078e+02 [2024-04-19 11:14:08.498] [info] > Finished
As you can see, 0.6.0 is about ~2x slower than 0.5.3 with batch size 704
0.5.3 (auto batch)
C:\dorado_0.5.3\bin\dorado.exe basecaller sup --no-trim --kit-name SQK-RBK114-24 --recursive .\pod5_pass > .\batch_0.5.3_sup.bam [2024-04-19 11:18:17.036] [info] > Simplex reads basecalled: 8940 [2024-04-19 11:18:17.036] [info] > Simplex reads filtered: 9 [2024-04-19 11:18:17.036] [info] > Basecalled @ Samples/s: 8.650497e+06 [2024-04-19 11:18:17.036] [info] > 8940 reads demuxed @ classifications/s: 5.422783e+02 [2024-04-19 11:18:17.068] [info] > Finished
0.6.0 (auto batch)
C:\dorado\bin\dorado.exe basecaller sup --no-trim --kit-name SQK-RBK114-24 --recursive .\pod5_pass > .\batch_0.6_sup.bam [2024-04-19 11:22:58.855] [info] > Simplex reads basecalled: 8926 [2024-04-19 11:22:58.855] [info] > Simplex reads filtered: 23 [2024-04-19 11:22:58.856] [info] > Basecalled @ Samples/s: 9.344314e+05 [2024-04-19 11:22:58.856] [info] > 9100 reads demuxed @ classifications/s: 7.233187e+01 [2024-04-19 11:22:58.873] [info] > Finished
As you can see, this time 0.6.0 was ~10x slower than 0.5.3
0.5.3 (5mC_5hmC, batch size 704)
C:\dorado_0.5.3\bin\dorado.exe basecaller sup,5mC_5hmC --no-trim --kit-name SQK-RBK114-24 --recursive --batchsize 704 .\pod5_pass > .\batch_0.5.3_5mC.bam [==============================] 100% [00m:26s<00m:00s] [2024-04-19 11:32:00.505] [info] > Simplex reads basecalled: 8940 [2024-04-19 11:32:00.505] [info] > Simplex reads filtered: 9 [2024-04-19 11:32:00.505] [info] > Basecalled @ Samples/s: 5.170664e+06 [2024-04-19 11:32:00.505] [info] > 8940 reads demuxed @ classifications/s: 3.241362e+02 [2024-04-19 11:32:00.541] [info] > Finished
0.6.0 (5mC_5hmC, batch size 704)
C:\dorado\bin\dorado.exe basecaller sup,5mC_5hmC --no-trim --kit-name SQK-RBK114-24 --recursive --batchsize 704 .\pod5_pass > .\batch_0.6_5mC.bam [2024-04-19 11:49:57.575] [info] > Simplex reads basecalled: 8926 [2024-04-19 11:49:57.575] [info] > Simplex reads filtered: 23 [2024-04-19 11:49:57.576] [info] > Basecalled @ Samples/s: 1.218453e+05 [2024-04-19 11:49:57.576] [info] > 9100 reads demuxed @ classifications/s: 9.431723e+00 [2024-04-19 11:49:57.598] [info] > Finished
Again, 0.6.0 is ~42x slower than 0.5.3 with 5mC_5hmC modification
0.5.3 (6mA, batch size 704)
C:\dorado_0.5.3\bin\dorado.exe basecaller sup,6mA --no-trim --kit-name SQK-RBK114-24 --recursive --batchsize 704 .\pod5_pass > .\batch_0.5.3_6mA.bam [2024-04-19 12:55:58.322] [info] > Simplex reads basecalled: 8940 [2024-04-19 12:55:58.322] [info] > Simplex reads filtered: 9 [2024-04-19 12:55:58.323] [info] > Basecalled @ Samples/s: 4.300856e+06 [2024-04-19 12:55:58.323] [info] > 8940 reads demuxed @ classifications/s: 2.696101e+02 [2024-04-19 12:55:58.371] [info] > Finished
0.6.0 (6mA, batch size 704)
C:\dorado\bin\dorado.exe basecaller sup,6mA --no-trim --kit-name SQK-RBK114-24 --recursive --batchsize 704 .\pod5_pass > ./batch_0.6_6mA.bam [2024-04-19 13:25:14.644] [info] > Simplex reads basecalled: 8926 [2024-04-19 13:25:14.644] [info] > Simplex reads filtered: 23 [2024-04-19 13:25:14.644] [info] > Basecalled @ Samples/s: 7.922124e+04 [2024-04-19 13:25:14.644] [info] > 9100 reads demuxed @ classifications/s: 6.132307e+00 [2024-04-19 13:25:14.665] [info] > Finished
With 6mA modification, 0.6.0 is ~54x slower than 0.5.3
0.5.3 (5mC_5hmC, 6mA, batch size 704)
C:\dorado_0.5.3\bin\dorado.exe basecaller sup,6mA,5mC_5hmC --no-trim --kit-name SQK-RBK114-24 --recursive --batchsize 704 .\pod5_pass >.\batch_0.5.3_6mA_5mC.bam [2024-04-19 13:28:17.593] [info] > Simplex reads basecalled: 8940 [2024-04-19 13:28:17.593] [info] > Simplex reads filtered: 9 [2024-04-19 13:28:17.593] [info] > Basecalled @ Samples/s: 3.546153e+06 [2024-04-19 13:28:17.593] [info] > 8940 reads demuxed @ classifications/s: 2.222996e+02 [2024-04-19 13:28:17.632] [info] > Finished
0.6.0 (5mC_5hmC, 6mA, batch size 704)
C:\dorado\bin\dorado.exe basecaller sup,6mA,5mC_5hmC --no-trim --kit-name SQK-RBK114-24 --recursive --batchsize 704 .\pod5_pass > .\batch_0.6_6mA_5mC.bam [2024-04-19 14:19:20.090] [info] > Simplex reads basecalled: 8926 [2024-04-19 14:19:20.090] [info] > Simplex reads filtered: 23 [2024-04-19 14:19:20.090] [info] > Basecalled @ Samples/s: 4.071408e+04 [2024-04-19 14:19:20.090] [info] > 9100 reads demuxed @ classifications/s: 3.151570e+00 [2024-04-19 14:19:20.115] [info] > Finished
Again, with both 5mC_5hmC & 6mA modification, 0.6.0 is ~87x slower than 0.5.3
Hi @samuelmontgomery - thank you for posting the response! I can see that your dataset is pretty small - only runs for about 26s. In our experience that small a dataset can show quite a bit of variability from run to run. Do you have a longer one to test with, say around 50k reads?
I'm also curious to see what batch size the auto batch size was determining and whether that was different between 0.5.3 and 0.6.0 (I don't see it in the logs).
thanks @tijyojwad - will run a dataset of ~50k reads with some of these tests
auto batch size for 0.5.3 is 832, auto batch size for 0.6.0 is:
0.5.3 sup only - batch size 704
C:\dorado_0.5.3\bin\dorado.exe basecaller sup --no-trim --kit-name SQK-RBK114-24 --recursive --batchsize 704 .\pod5_pass > .\batch_0.5.3_sup.bam
[2024-04-22 12:52:20.101] [info] > Simplex reads basecalled: 39366 [2024-04-22 12:52:20.101] [info] > Basecalled @ Samples/s: 9.669210e+06 [2024-04-22 12:52:20.101] [info] > 39366 reads demuxed @ classifications/s: 1.069586e+02 [2024-04-22 12:52:20.134] [info] > Finished
0.6.0 sup only - batch size 704 C:\dorado\bin\dorado.exe basecaller sup --no-trim --kit-name SQK-RBK114-24 --recursive --batchsize 704 .\pod5_pass > .\batch_0.6.0_sup.bam
[2024-04-22 12:59:14.949] [info] > Simplex reads basecalled: 39362 [2024-04-22 12:59:14.950] [info] > Simplex reads filtered: 4 [2024-04-22 12:59:14.950] [info] > Basecalled @ Samples/s: 9.066649e+06 [2024-04-22 12:59:14.950] [info] > 39722 reads demuxed @ classifications/s: 1.253447e+02 [2024-04-22 12:59:14.972] [info] > Finished
0.5.3 sup only - batch size auto (batch size 896) C:\dorado_0.5.3\bin\dorado.exe basecaller sup --no-trim --kit-name SQK-RBK114-24 --recursive .\pod5_pass > .\batch_0.5.3_sup_auto.bam
[2024-04-22 13:06:01.835] [info] > Simplex reads basecalled: 39366 [2024-04-22 13:06:01.835] [info] > Basecalled @ Samples/s: 1.056054e+07 [2024-04-22 13:06:01.835] [info] > 39366 reads demuxed @ classifications/s: 1.168183e+02 [2024-04-22 13:06:01.882] [info] > Finished
0.6.0 sup only - batch size auto (896, 1792) C:\dorado\bin\dorado.exe basecaller sup --no-trim --kit-name SQK-RBK114-24 --recursive .\pod5_pass > .\batch_0.6.0_sup_auto.bam
[2024-04-22 13:11:48.247] [info] > Simplex reads basecalled: 39362 [2024-04-22 13:11:48.247] [info] > Simplex reads filtered: 4 [2024-04-22 13:11:48.247] [info] > Basecalled @ Samples/s: 1.039962e+07 [2024-04-22 13:11:48.248] [info] > 39725 reads demuxed @ classifications/s: 1.437837e+02 [2024-04-22 13:11:48.273] [info] > Finished
0.5.3 sup, 5mC_5hmC - batch size auto (896)
C:\dorado_0.5.3\bin\dorado.exe basecaller sup,5mC_5hmC --no-trim --kit-name SQK-RBK114-24 --recursive .\pod5_pass\ > .\batch_0.5.3_5mC_auto.bam
[2024-04-22 13:37:27.759] [info] > Simplex reads basecalled: 39366 [2024-04-22 13:37:27.759] [info] > Basecalled @ Samples/s: 3.430220e+06 [2024-04-22 13:37:27.759] [info] > 39366 reads demuxed @ classifications/s: 3.794430e+01 [2024-04-22 13:37:27.820] [info] > Finished
0.5.3 sup, 5mC_5hmC - batch size 704
C:\dorado_0.5.3\bin\dorado.exe basecaller sup,5mC_5hmC --no-trim --kit-name SQK-RBK114-24 --recursive --batchsize 704 .\pod5_pass\ > .\batch_0.5.3_5mC.bam [====================] 100% [10m:37s<00m:00s] Basecalling [2024-04-22 14:17:48.206] [info] > Simplex reads basecalled: 39366 [2024-04-22 14:17:48.206] [info] > Basecalled @ Samples/s: 5.569421e+06 [2024-04-22 14:17:48.206] [info] > 39366 reads demuxed @ classifications/s: 6.160766e+01 [2024-04-22 14:17:48.279] [info] > Finished
0.6.0 sup, 5mC_5hmC - batch size auto (896, 1024)
C:\dorado\bin\dorado.exe basecaller sup,5mC_5hmC --no-trim --kit-name SQK-RBK114-24 --recursive .\pod5_pass\ > .\batch_0.6.0_5mC_auto.bam [2024-04-22 14:26:37.269] [info] > Creating basecall pipeline [2024-04-22 14:26:49.450] [info] cuda:0 using chunk size 9996, batch size 896 [2024-04-22 14:26:50.342] [info] cuda:0 using chunk size 4998, batch size 1024 [> ] 2% [12m:23s<10h:06m:47s] Basecalling
0.6.0 sup,5mC_5hmC is only 2% completed in the time it took to complete basecalling with 0.5.3 sup,5mC_5hmC, but was equivalent with just canonical basecalling
Looking at 6mA as this is where I previously found performance was reduced from 6mA@v1 to 6mA@v2 as well (in #660)
6mA 0.5.3 batch size auto
C:\dorado_0.5.3\bin\dorado.exe basecaller sup,6mA --no-trim --kit-name SQK-RBK114-24 --recursive .\pod5_pass\ > .\batch_0.5.3_6mA_auto.bam [2024-04-22 15:51:10.710] [info] - downloading dna_r10.4.1_e8.2_400bps_sup@v4.3.0 with httplib [2024-04-22 15:51:13.240] [info] - downloading dna_r10.4.1_e8.2_400bps_sup@v4.3.0_6mA@v2 with httplib [2024-04-22 15:51:14.246] [info] > Creating basecall pipeline [2024-04-22 15:51:20.298] [info] - set batch size for cuda:0 to 832 [2024-04-22 15:51:20.305] [info] Barcode for SQK-RBK114-24 => ] 9% [10m:00s<01h:41m:10s]
Only 9% completed after 10 minutes using 0.5.3 with auto batch size
6mA 0.5.3 batch size 704
C:\dorado_0.5.3\bin\dorado.exe basecaller sup,6mA --no-trim --kit-name SQK-RBK114-24 --recursive --batchsize 704 .\pod5_pass\ > .\batch_0.5.3_6mA.bam [2024-04-22 16:02:11.437] [info] - downloading dna_r10.4.1_e8.2_400bps_sup@v4.3.0 with httplib [2024-04-22 16:02:14.547] [info] - downloading dna_r10.4.1_e8.2_400bps_sup@v4.3.0_6mA@v2 with httplib [2024-04-22 16:02:15.089] [info] > Creating basecall pipeline [2024-04-22 16:02:18.238] [info] Barcode for SQK-RBK114-24 [==============================] 100% [10m:55s<00m:00s] [2024-04-22 16:13:15.040] [info] > Simplex reads basecalled: 39366 [2024-04-22 16:13:15.040] [info] > Basecalled @ Samples/s: 5.418454e+06 [2024-04-22 16:13:15.040] [info] > 39366 reads demuxed @ classifications/s: 5.993770e+01 [2024-04-22 16:13:15.104] [info] > Finished
basecalling sup,6mA with batch size 704 reduces the basecalling time ~10x compared to auto batch size (832), samples/s equivalent to 5mC_5hmC model with batch size set to 704
6mA 0.6.0 batch size auto
C:\dorado\bin\dorado.exe basecaller sup,6mA --no-trim --kit-name SQK-RBK114-24 --recursive .\pod5_pass\ > .\batch_0.6.0_6mA_auto.bam [2024-04-22 16:14:47.238] [info] > Creating basecall pipeline [2024-04-22 16:14:59.101] [info] cuda:0 using chunk size 9996, batch size 896 [2024-04-22 16:14:59.903] [info] cuda:0 using chunk size 4998, batch size 1024 [=> ] 5% [17m:16s<05h:28m:12s] Basecalling
6mA 0.6.0 batch size 704
[2024-04-22 16:33:22.810] [info] - downloading dna_r10.4.1_e8.2_400bps_sup@v4.3.0 with httplib [2024-04-22 16:33:25.196] [info] - downloading dna_r10.4.1_e8.2_400bps_sup@v4.3.0_6mA@v2 with httplib [2024-04-22 16:33:25.651] [info] > Creating basecall pipeline [2024-04-22 16:33:27.937] [info] cuda:0 using chunk size 9996, batch size 704 [2024-04-22 16:33:28.668] [info] cuda:0 using chunk size 4998, batch size 704 [========> ] 27% [01h:15m:52s<03h:25m:08s] Basecalling
Hi,
Just following up now that the new models (sup@v5.0.0) have been released with dorado v0.7.0
Using v0.7.0 with the sup@v5.0.0 model, dorado is basecalling at ~33% of the samples/s compared to both v0.5.3 and v0.6.0 with sup@v4.3.0 models
Additionally, with the new models, the chunk size has changed compared to sup@v4.3.0 - will that result in different basecalled data compared to the older ones? I know chunk size is important for reproducibility of basecalling
sup@v5.0.0
[2024-05-22 13:10:10.008] [info] cuda:0 using chunk size 12288, batch size 128 [2024-05-22 13:10:10.278] [info] cuda:0 using chunk size 6144, batch size 128
sup@v4.3.0
[2024-05-22 16:18:06.989] [info] cuda:0 using chunk size 9996, batch size 896 [2024-05-22 16:18:07.787] [info] cuda:0 using chunk size 4998, batch size 1024
Some benchmarking below
0.5.3 sup@v4.3.0 auto batchsize
C:\dorado_0.5.3\bin\dorado.exe basecaller sup --no-trim --kit-name SQK-NBD114-96 --recursive .\pod5_pass\barcode01 > .\basecalling_test\barcode01_batch5.3.bam
[2024-05-22 15:01:33.838] [info] > Simplex reads basecalled: 45738 [2024-05-22 15:01:33.838] [info] > Simplex reads filtered: 6 [2024-05-22 15:01:33.838] [info] > Basecalled @ Samples/s: 1.074588e+07 [2024-05-22 15:01:33.838] [info] > 45883 reads demuxed @ classifications/s: 1.317360e+02 [2024-05-22 15:01:33.882] [info] > Finished
0.7.0 sup@v5.0.0 auto batchsize
C:\dorado_0.7.0\bin\dorado.exe basecaller sup --no-trim --kit-name SQK-NBD114-96 --recursive .\pod5_pass\barcode01 > .\basecalling_test\barcode01.bam [2024-05-22 13:10:00.651] [info] > Creating basecall pipeline [2024-05-22 13:10:10.008] [info] cuda:0 using chunk size 12288, batch size 128 [2024-05-22 13:10:10.278] [info] cuda:0 using chunk size 6144, batch size 128 [==============================] 100% [16m:46s<00m:00s] Sorting output files [2024-05-22 10:07:39.284] [info] > Simplex reads basecalled: 45738 [2024-05-22 10:07:39.284] [info] > Basecalled @ Samples/s: 3.715079e+06 [2024-05-22 10:07:39.284] [info] > 45882 reads demuxed @ classifications/s: 4.554293e+01 [2024-05-22 10:07:39.352] [info] > Finished
0.7.0 sup@v4.3.0 auto batchsize
C:\dorado_0.7.0\bin\dorado.exe basecaller sup@v4.3.0 --no-trim --kit-name SQK-NBD114-96 --recursive .\pod5_pass\barcode01 > .\basecalling_test\barcode01_batch4.3.bam [2024-05-22 15:45:55.210] [info] > Creating basecall pipeline [2024-05-22 16:18:06.989] [info] cuda:0 using chunk size 9996, batch size 896 [2024-05-22 16:18:07.787] [info] cuda:0 using chunk size 4998, batch size 1024 [==============================] 100% [05m:59s<00m:00s] Sorting output files [2024-05-22 16:13:30.982] [info] > Simplex reads basecalled: 45738 [2024-05-22 16:13:30.982] [info] > Simplex reads filtered: 6 [2024-05-22 16:13:30.983] [info] > Basecalled @ Samples/s: 1.036893e+07 [2024-05-22 16:13:30.983] [info] > 45883 reads demuxed @ classifications/s: 1.271149e+02 [2024-05-22 16:13:31.004] [info] > Finished
As you can see, the sup@v5.0.0 model runs at 1/3 the speed of the sup@v4.3.0 models regardless of dorado version I will test with modifications as well, as maybe the new transformer models make up time there
There is only a ~25% drop in speed when running the 4mC_5mC & 6mA modified models together compared to canonical alone - meaning while there is a fairly large drop in canonical basecalling speed when using sup@v5.0.0 compared to sup@v4.3.0, the modified basecalling runs at a similar speed
C:\dorado_0.7.0\bin\dorado.exe basecaller sup,4mC_5mC --no-trim --kit-name SQK-NBD114-96 --recursive .\pod5_pass\barcode01 > .\barcode01_4mC.bam
[2024-05-23 13:19:10.755] [info] cuda:0 using chunk size 12288, batch size 128 [2024-05-23 13:19:11.050] [info] cuda:0 using chunk size 6144, batch size 128 [2024-05-22 16:52:22.147] [info] > Simplex reads basecalled: 45738 [2024-05-22 16:52:22.147] [info] > Basecalled @ Samples/s: 2.778261e+06 [2024-05-22 16:52:22.147] [info] > 45882 reads demuxed @ classifications/s: 3.405854e+01 [2024-05-22 16:52:22.208] [info] > Finished
C:\dorado_0.7.0\bin\dorado.exe basecaller sup,4mC_5mC,6mA --no-trim --kit-name SQK-NBD114-96 --recursive .\pod5_pass\barcode01 > .\barcode01_4mC_6mA.bam
[2024-05-23 13:19:10.755] [info] cuda:0 using chunk size 12288, batch size 128 [2024-05-23 13:19:11.050] [info] cuda:0 using chunk size 6144, batch size 128 [==============================] 100% [27m:33s<00m:00s] Sorting output files [2024-05-23 11:58:58.339] [info] > Simplex reads basecalled: 45738 [2024-05-23 11:58:58.339] [info] > Basecalled @ Samples/s: 2.262337e+06 [2024-05-23 11:58:58.339] [info] > 45882 reads demuxed @ classifications/s: 2.773385e+01 [2024-05-23 11:58:58.415] [info] > Finished
C:\dorado_0.7.0\bin\dorado.exe basecaller sup,6mA --no-trim --kit-name SQK-NBD114-96 --recursive .\pod5_pass\barcode01 > .\barcode01_6mA.bam
[2024-05-23 13:19:10.755] [info] cuda:0 using chunk size 12288, batch size 128 [2024-05-23 13:19:11.050] [info] cuda:0 using chunk size 6144, batch size 128 [==============================] 100% [23m:35s<00m:00s] Sorting output files [==============================] 100% [23m:35s<00m:00s] Sorting output files [2024-05-23 13:42:48.060] [info] > Simplex reads basecalled: 45738 [2024-05-23 13:42:48.060] [info] > Basecalled @ Samples/s: 2.643497e+06 [2024-05-23 13:42:48.060] [info] > 45882 reads demuxed @ classifications/s: 3.240648e+01 [2024-05-23 13:42:48.136] [info] > Finished
Hi @samuelmontgomery, the sup@v5.0.0 models a use a new model architecture which we are still optimising. You can expect to see improvements in sup v5 basecalling speed in future dorado releases.
Thanks @HalfPhoton - I will close this as I was more related to modified basecalling and I think 0.7.0 has solved the massive slowdown I was seeing with 0.6.0 & 0.6.1, and it is now comparable to the speed of 0.5.3 with obvious benefits!
Hi, I had optimised the batch size parameter (as noted in #660) using a test subset of my data which saw drastic improvements in 0.5.3 (Samples/s: 1.034746e+05 vs Samples/s: 3.626776e+06)
I have updated to dorado v0.6.0 running the same dataset, and I am seeing a drastic reduction in basecalling speed
Run on 0.5.3: \dorado.exe basecaller sup,6mA,5mC_5hmC --no-trim --kit-name SQK-RBK114-24 --recursive pod5 > basecalled.bam
Run on 0.6.0:
This is not improved by manually setting batch size to my previously optimised value where the same dataset took 1m39s to complete, and running with --verbose, 0.6.0 seems run two different models or split GPU usage into two?
I've noticed a difference in how it is setting batch size automatically (and seems to be changing the chunk size? won't that result in different basecalls? I thought it was tied to hardware), but it is only ~30% complete in the same run time
I know the changelog mentions optimisations for running on Apple M chips, but these seem to be coming at a serious loss of performance for RTX GPUs (i.e. PCs for running the P2 Solo)