nanoporetech / dorado

Oxford Nanopore's Basecaller
https://nanoporetech.com/
Other
445 stars 54 forks source link

Basecall process 100% but memory is still occupied for a long time #795

Closed Yang990-sys closed 1 month ago

Yang990-sys commented 1 month ago

Hello,

I am using Dorado v0.5.3 for RNA004 basecalling , but I frequently encounter issues with progress bars reaching 100% but memory not being released for a long time, lasting from 20 minutes to 5 hours. Is this a normal situation or a bug?

Memory is still occupied: image The progress bar reached 100% 5 hours ago : image

malton-ont commented 1 month ago

Hi @Yang990-sys,

Can you please fill out the issue template? In particular, the command you are using to run dorado would be very helpful here.

Yang990-sys commented 1 month ago

Issue Report

Please describe the issue:

As above

Steps to reproduce the issue:

copy pod5 and basecall

Run environment:

———————————————— I guess it's because I set the batch size too large? The default batch size of the program can only use 85% of GPU memory. In order to speed up progress and not waste resources, the batch size is intentionally set to be too large, and the program will prompt to be too large, adjusting it to a value that can fill up memory without crashing. I don't know if this operation is meaningful and if it will cause the above bug?

malton-ont commented 1 month ago

Hi @Yang990-sys,

The default batch size is determined by performing a timing sweep of different batch sizes to determine the most efficient value - this may not be the same as the value that uses the maximum GPU memory (or we'd just have picked the max value as the default!). Having said that, I don't see how that would cause a long delay in shutdown, except that dorado needs to deallocate the memory at the end.

For a 13 hour run, that's a lot of data - 100% may not be exactly 100% and more processing may still be occurring. Are you able to run a process monitor such as htop? Additional processing such as the poly-a estimation occurs on the CPU, and so will not show as GPU activity.

Yang990-sys commented 1 month ago

You are right, even if the GPU usage is 0, there are still some threads running on the CPU; Thank you for your clarification! image