dorado run time- super high accuracy, promethION

nanoporetech / dorado

Oxford Nanopore's Basecaller

https://nanoporetech.com/

Other

534 stars 64 forks source link

dorado run time- super high accuracy, promethION #1132

Open YiJessePi opened 5 days ago

YiJessePi commented 5 days ago

I'm executing dorado on promethION flowcells while basecalling with super-accuracy and some methylations. I'm running it per pod5 file, which I typically have 60-70 pod5 files per run. Each pod5 file is being executed by Tesla V100 with 16-32G mem and takes ~10 hours to run, so for the whole flowcell I need at least 600 GPU hours (which typically takes 5 days). Does it sounds reasonable to you? Does executing on each pod5 file separately is the correct way to do so or it's better to merge?

Run environment:

Dorado version: 0.7.3
Dorado command: dorado basecaller -v --kit-name - --trim 'all' sup@v5.0.0 --modified-bases 4mC_5mC 6mA
Hardware (CPUs, Memory, GPUs): GPU Tesla V100 with 16-32G
Source data type (e.g., pod5 or fast5 - please note we always recommend converting to pod5 for optimal basecalling performance): pod5
Details about data (flow cell, kit, read lengths, number of reads, total dataset size in MB/GB/TB): PromethION ~150GB

HalfPhoton commented 5 days ago

Hi @YiJessePi,

Does it sounds reasonable to you?

Basecalling with sup and multiple methylations is going to relatively slow compared to basecalling with hac or without modifications. The sup + modifications is the most computationally expensive configuration for basecalling.

We're always working on improving the performance of basecalling but at this time relatively slow performance is not particularly unexpected.

Does executing on each pod5 file separately is the correct way to do so or it's better to merge?

You don't need to merge the pod5 files for simplex dorado basecalling. The performance in boths cases will be the same.

Best regards, Rich

YiJessePi commented 4 days ago

great! Thanks for that, just wanted to verify I'm not missing something here. We are considering purchasing GPU machine to make the process smoother from our side, do you have a recommendation of GPU server or machine that will work the best with dorado?

HalfPhoton commented 4 days ago

Dorado is heavily-optimised for Nvidia A100 and H100 GPUs and will deliver maximal performance on systems with these GPUs.