Closed melachl closed 6 months ago
Did you try 0.4.1? I could basecall 1.05TB promethion simplex raw data in 6h40min with dna_r10.4.1_e8.2_400bps_sup@v4.2.0 and dna_r10.4.1_e8.2_400bps_sup@v4.2.0_5mCG_5hmCG@v3.1 model on my 4xA100 machine with 0.4.1. Haven't tried 0.4.3 yet.
No I didn't because I always get an error when I build another dorado version from source.
@melachl have you tried one of the pre-built dorado binaries? https://github.com/nanoporetech/dorado#installation
Yes, that's how I installed the dorado version 0.4.3 but the performance is very very bad!
Is your data fast5 or pod5? dorado doesn't work well with fast5 as far as I know.
I use pod5 files. Dorado already worked quite well for me since I upgraded to 0.4.3 version. I have the impression that the new version is the problem but I don't know if anyone else have the same problem.
I am currently running a large dataset with both 0.4.1 and 0.4.3 (simplex, sup) with 5mC_5hmC modbase model, each on a single A100 .. as the progress bar has been removed, I have no (real) clue about ETA ..
You could potentially estimate it using wc -l on the output files to get the number of lines and compare to total reads.
I have a quick update: I successfully installed the older dorado version 0.3.4 again and started the basecalling process and now the basecalling last only a few hours and not 3 days like with the latest dorado version 0.4.3. So the performance problem definitely is on the side of the new dorado verison.
Hi @melachl - this does seem like a regression. Can you post the basecalling cmd you're using?
@sklages
as the progress bar has been removed
this is unexpected. I'm able to see the progress bar with 0.4.3. are you redirecting the stderr somewhere?
I used: $ dorado_034 duplex dna_r10.4.1_e8.2_400bps_hac@v4.2.0 pod5/ > calls_hac.bam
Thanks @melachl - I am able to repro the regression. Sorry about that, we'll look into that right away.
@sklages - my apologies, I was checking the progress bar for simplex. I can confirm that progress bar for duplex has disappeared :(.
@sklages
as the progress bar has been removed
this is unexpected. I'm able to see the progress bar with 0.4.3. are you redirecting the stderr somewhere?
Yes, stderr is redirected. That's why I do not get the progress bar. Though I'd wish that there is some progress to see even if stderr is redirected ..
@sklages - my apologies, I was checking the progress bar for simplex. I can confirm that progress bar for duplex has disappeared :(.
Both basecallings are run in simplex
(sup) mode for "benchmarking".
Duplex basecalling with current pod5
(0.3.2) and current dorado
(0.4.3) is unusable for me.
Yes, ack - we're able to repro and debugging it now
Hi @melachl - can you try to run duplex on v0.4.1 and v0.4.2? we did some more runs and on A100, we're seeing fairly consistent performance. Can you also post what CPU and GPU you're using?
One theory for why going from 0.3.4 to 0.4.x slows down duplex for you is because we switched to using minimap2 to determine overlap for potential duplex pairs. this adds some CPU overhead, so if your CPU is a bottleneck it could show up more here.
I am using macbook pro with M1 max (CPU: 10 Cores , GPU: 32 Cores).
Currently I am sequencing so at the moment I am not able to try v0.4.1 and v0.4.2 but I will give you update as soon as I can do it.
Could you tell us how much memory the machine has?
It looks as if this was run with automatic batch size selection. If so, then older versions of dorado would quite possibly have had performance problems stemming from memory thrashing, even without duplex involved. 0.5.0 should do significantly better both in terms of batch size selection and the amount of memory it needs for a given batch size. That said, duplex is in general not as optimised on Apple silicon as simplex.
I have 64 GB RAM.
I was very suprised because with older dorado versions duplex basecalling worked quite well and also fast. But with the latest one there was an big performance breakdown. In future should I set the batch size to a certain value? Could you recommend to which batch size? Any other hints how to deal with duplex basecalling with apple Silicon? Is it planned to optimize duplex basecalling also for apple silicon?
I'm not aware of Apple silicon duplex performance regressions between the versions you've tried, so I'm not sure how to explain the disparity you saw. I think it's well worth trying 0.5.0, since it should use substantially less memory than previous versions, and excessive memory use was causing very poor performance in certain scenarios. If performance is still bad then we'll need to investigate further.
Thank you for your help! I tried the new version 0.5.0 but I get an error:
(base) melanieachleitner@MBP-von-Melanie ~ % dorado duplex hac pod5/ > calls.bam [2023-12-11 08:05:40.448] [info] > No duplex pairs file provided, pairing will be performed automatically [2023-12-11 08:05:40.560] [error] Failed to find the kernel: forward_scan_add_softmax
Can anyone help me with that?
That error suggests you have a mismatch between the dorado executable and the shader library it uses, default.metallib. I'd suggest a clean install, ensuring you have the version of default.metallib associated with 0.5.0, rather than one from an earlier version.
I updated dorado from 0.3.4 to 0.4.3 because I wanted to demultiplex my samples which worked quite well but when I now try to basecall data in duplex mode the performance is so bad. Also basecalling in simplex mode lasts much longer than before. A very small experiment of targeted sequencing of mitochondrial DNA lasts more than 2 days for basecalling in duplex mode. Basecalling of such experiments lasts usually only 30 min to 3 h depending on the amount of reads. I also tried the advise to split the reads into channels to speed up but that does not help anything. Could anyone help me with this issue? Is there a problem with dorado or is the problem on my side?