Low % basecalled, not using all GPU

nanoporetech / dorado

Oxford Nanopore's Basecaller

https://nanoporetech.com/

Other

477 stars 59 forks source link

Low % basecalled, not using all GPU #669

Closed franztastic closed 5 months ago

franztastic commented 6 months ago

Hi everyone,

In our lab we're running an enrichment adaptive sampling experiment using MinKnow.

After an hour running the experiment, we have such a small basecaller %, about 65% (for a long while it's been 34%) When checking the % of GPU used it's about 50-60% or even less.

We are concerned that the basecaller is not being fast enough to be doing it correctly so then we might be throwing away many more sequences than it should be.

We were thinking of having a full use of the GPU and a basecalled % of about 90% in order to achieve optimal adaptive sampling performance.

Dorado version: 7.2.13
Dorado command: (just using MINKNOW ui)
Operating system: UBUNTU 22.04.4
Hardware (CPUs, Memory, GPUs): NVIDIA RTX A6000
Source data location (on device or networked drive - NFS, etc.): Promethion - local

Is it somehow we can increase the percentage of the basecaller?

Thank you very much for your assistance, best wishes!

franztastic commented 6 months ago

Hi again, after an hour, the basecaller % is in 94% and GPU is still in 47%. I guess, it's normal to be waiting for an hour to achieve these values...

Thanks!

ethan-mcq commented 6 months ago

I may be wrong, so someone correct me, but it might be behaving this way due to read chunks. IE. your read count is so low due to adaptive sampling that it is waiting for a certain amount of reads to be sequenced before running a basecalling chunk. By default, POD5 files are not saved until a minimum of 4k reads. Similarly, live basecalling may not begin until certain read count thresholds are met. (4k, then 8k, then 12k, etc)

frumencelab commented 6 months ago

Hello,

I have encountered the same issue with the latest version of MinKnow. During sequencing, my current GPU (RTX 3070 8GB) struggles to perform live basecalling, achieving only 30-40% efficiency. Despite more than 24 hours of sequencing, this figure only marginally improves, never exceeding 80-90%. Additionally, my graphics card is consistently underutilized, with only 50-70% utilization and 50% of memory occupied by dorado_basecall_server. 1708078171791

I would greatly appreciate any insights or suggestions to enhance the basecalling process.

Dorado version: 7.2.13
Dorado command: using Minknow 23.11
Operating system: Ubuntu 20.04
Hardware (CPUs, Memory, GPUs): NVIDIA RTX 3070 8GB
Source data location (on device or networked drive - NFS, etc.): MinION - local

Thank you in advance for your time and assistance.

Best regards,

tijyojwad commented 5 months ago

Hi @franztastic - your issue is similar to a throttling issue we noticed in MinKNOW which is being fixed in the 5.9 release. So the upcoming release should fix your problem.

@frumencelab - I discussed with the team internally and your issue seems to be different. Could you open a ticket on the Nanopore Community page?