mahmoodlab / CLAM

Data-efficient and weakly supervised computational pathology on whole slide images - Nature Biomedical Engineering
http://clam.mahmoodlab.org
GNU General Public License v3.0
975 stars 329 forks source link

GPU usage with the new version #237

Open gusSCIMOV opened 3 months ago

gusSCIMOV commented 3 months ago

Hi. In the latest version I addressed all the setting up for torch (inside a docker container based on nvcr.io/nvidia/pytorch:23.12-py3), and the environment (.yml) which is now get installed in a more straightforward way. Although my torch device is visible (torch.cuda.device_count() == 3) , my docker stats shows actually, no GPU usage with a high MEM load ~92Gb and after two hours the whole process gets stuck (attached image docker_stats ) Im using that device (3) when running main.py. Is there any suggestion to address the GPU configuration?

fedshyvana commented 3 months ago

main.py should not demand heavy usage of GPU memory because the MIL models built on top of extracted features are very light weight (in fact i think you can run it at a reasonable speed even on the CPU). main.py would mostly be bottlenecked by IO from my experience (e.g. storing extracted features on NVME SSD vs. HDD makes a huge difference). I don't know if there are other issues with your use case though (if you have error messages please share). I will try to do some more testing myself this weekend if I have time.

liyuyao0807 commented 2 months ago

Hi@gusSCIMOV Are you using the new pipeline or the old one? I'm encountering some issues with configuring my environment and was wondering if I could take a look at your .yml file? If possible, could you please leave a reply to this message? Thank you very much! Best regards! liyuyao0807