Closed Yang990-sys closed 1 month ago
Hi @Yang990-sys,
Can you please fill out the issue template? In particular, the command you are using to run dorado would be very helpful here.
As above
copy pod5 and basecall
———————————————— I guess it's because I set the batch size too large? The default batch size of the program can only use 85% of GPU memory. In order to speed up progress and not waste resources, the batch size is intentionally set to be too large, and the program will prompt to be too large, adjusting it to a value that can fill up memory without crashing. I don't know if this operation is meaningful and if it will cause the above bug?
Hi @Yang990-sys,
The default batch size is determined by performing a timing sweep of different batch sizes to determine the most efficient value - this may not be the same as the value that uses the maximum GPU memory (or we'd just have picked the max value as the default!). Having said that, I don't see how that would cause a long delay in shutdown, except that dorado needs to deallocate the memory at the end.
For a 13 hour run, that's a lot of data - 100% may not be exactly 100% and more processing may still be occurring. Are you able to run a process monitor such as htop
? Additional processing such as the poly-a estimation occurs on the CPU, and so will not show as GPU activity.
You are right, even if the GPU usage is 0, there are still some threads running on the CPU; Thank you for your clarification!
Hello,
I am using Dorado v0.5.3 for RNA004 basecalling , but I frequently encounter issues with progress bars reaching 100% but memory not being released for a long time, lasting from 20 minutes to 5 hours. Is this a normal situation or a bug?
Memory is still occupied:
The progress bar reached 100% 5 hours ago :
![image](https://github.com/nanoporetech/dorado/assets/54176741/6502f2f0-5697-495b-bf92-295f56c08ae2)