nanoporetech / dorado

Oxford Nanopore's Basecaller
https://nanoporetech.com/
Other
527 stars 63 forks source link

very bad performance of dorado 0.4.3. #497

Closed melachl closed 6 months ago

melachl commented 11 months ago

I updated dorado from 0.3.4 to 0.4.3 because I wanted to demultiplex my samples which worked quite well but when I now try to basecall data in duplex mode the performance is so bad. Also basecalling in simplex mode lasts much longer than before. A very small experiment of targeted sequencing of mitochondrial DNA lasts more than 2 days for basecalling in duplex mode. Basecalling of such experiments lasts usually only 30 min to 3 h depending on the amount of reads. I also tried the advise to split the reads into channels to speed up but that does not help anything. Could anyone help me with this issue? Is there a problem with dorado or is the problem on my side?

ymcki commented 11 months ago

Did you try 0.4.1? I could basecall 1.05TB promethion simplex raw data in 6h40min with dna_r10.4.1_e8.2_400bps_sup@v4.2.0 and dna_r10.4.1_e8.2_400bps_sup@v4.2.0_5mCG_5hmCG@v3.1 model on my 4xA100 machine with 0.4.1. Haven't tried 0.4.3 yet.

melachl commented 11 months ago

No I didn't because I always get an error when I build another dorado version from source.

tijyojwad commented 11 months ago

@melachl have you tried one of the pre-built dorado binaries? https://github.com/nanoporetech/dorado#installation

melachl commented 11 months ago

Yes, that's how I installed the dorado version 0.4.3 but the performance is very very bad!

ymcki commented 11 months ago

Is your data fast5 or pod5? dorado doesn't work well with fast5 as far as I know.

melachl commented 11 months ago

I use pod5 files. Dorado already worked quite well for me since I upgraded to 0.4.3 version. I have the impression that the new version is the problem but I don't know if anyone else have the same problem.

sklages commented 11 months ago

I am currently running a large dataset with both 0.4.1 and 0.4.3 (simplex, sup) with 5mC_5hmC modbase model, each on a single A100 .. as the progress bar has been removed, I have no (real) clue about ETA ..

Psy-Fer commented 11 months ago

You could potentially estimate it using wc -l on the output files to get the number of lines and compare to total reads.

melachl commented 11 months ago

I have a quick update: I successfully installed the older dorado version 0.3.4 again and started the basecalling process and now the basecalling last only a few hours and not 3 days like with the latest dorado version 0.4.3. So the performance problem definitely is on the side of the new dorado verison.

tijyojwad commented 11 months ago

Hi @melachl - this does seem like a regression. Can you post the basecalling cmd you're using?

tijyojwad commented 11 months ago

@sklages

as the progress bar has been removed

this is unexpected. I'm able to see the progress bar with 0.4.3. are you redirecting the stderr somewhere?

melachl commented 11 months ago

I used: $ dorado_034 duplex dna_r10.4.1_e8.2_400bps_hac@v4.2.0 pod5/ > calls_hac.bam

tijyojwad commented 11 months ago

Thanks @melachl - I am able to repro the regression. Sorry about that, we'll look into that right away.

@sklages - my apologies, I was checking the progress bar for simplex. I can confirm that progress bar for duplex has disappeared :(.

sklages commented 11 months ago

@sklages

as the progress bar has been removed

this is unexpected. I'm able to see the progress bar with 0.4.3. are you redirecting the stderr somewhere?

Yes, stderr is redirected. That's why I do not get the progress bar. Though I'd wish that there is some progress to see even if stderr is redirected ..

sklages commented 11 months ago

@sklages - my apologies, I was checking the progress bar for simplex. I can confirm that progress bar for duplex has disappeared :(.

Both basecallings are run in simplex (sup) mode for "benchmarking".

Duplex basecalling with current pod5 (0.3.2) and current dorado (0.4.3) is unusable for me.

tijyojwad commented 11 months ago

Yes, ack - we're able to repro and debugging it now

tijyojwad commented 11 months ago

Hi @melachl - can you try to run duplex on v0.4.1 and v0.4.2? we did some more runs and on A100, we're seeing fairly consistent performance. Can you also post what CPU and GPU you're using?

One theory for why going from 0.3.4 to 0.4.x slows down duplex for you is because we switched to using minimap2 to determine overlap for potential duplex pairs. this adds some CPU overhead, so if your CPU is a bottleneck it could show up more here.

melachl commented 11 months ago

I am using macbook pro with M1 max (CPU: 10 Cores , GPU: 32 Cores).

Currently I am sequencing so at the moment I am not able to try v0.4.1 and v0.4.2 but I will give you update as soon as I can do it.

StuartAbercrombie commented 11 months ago

Could you tell us how much memory the machine has?

It looks as if this was run with automatic batch size selection. If so, then older versions of dorado would quite possibly have had performance problems stemming from memory thrashing, even without duplex involved. 0.5.0 should do significantly better both in terms of batch size selection and the amount of memory it needs for a given batch size. That said, duplex is in general not as optimised on Apple silicon as simplex.

melachl commented 11 months ago

I have 64 GB RAM.

I was very suprised because with older dorado versions duplex basecalling worked quite well and also fast. But with the latest one there was an big performance breakdown. In future should I set the batch size to a certain value? Could you recommend to which batch size? Any other hints how to deal with duplex basecalling with apple Silicon? Is it planned to optimize duplex basecalling also for apple silicon?

StuartAbercrombie commented 11 months ago

I'm not aware of Apple silicon duplex performance regressions between the versions you've tried, so I'm not sure how to explain the disparity you saw. I think it's well worth trying 0.5.0, since it should use substantially less memory than previous versions, and excessive memory use was causing very poor performance in certain scenarios. If performance is still bad then we'll need to investigate further.

melachl commented 11 months ago

Thank you for your help! I tried the new version 0.5.0 but I get an error:

(base) melanieachleitner@MBP-von-Melanie ~ % dorado duplex hac pod5/ > calls.bam [2023-12-11 08:05:40.448] [info] > No duplex pairs file provided, pairing will be performed automatically [2023-12-11 08:05:40.560] [error] Failed to find the kernel: forward_scan_add_softmax

Can anyone help me with that?

StuartAbercrombie commented 11 months ago

That error suggests you have a mismatch between the dorado executable and the shader library it uses, default.metallib. I'd suggest a clean install, ensuring you have the version of default.metallib associated with 0.5.0, rather than one from an earlier version.