isayevlab / Auto3D_pkg

Auto3D generates low-energy conformers from SMILES/SDF
MIT License
146 stars 32 forks source link

Code blocks but doesn't use GPUs #48

Closed OlgaGKononova closed 8 months ago

OlgaGKononova commented 12 months ago

Hi there!

I am running Auto3D on 200 smiles with --use_gpu flag being True and I found that it blocks all the available GPUs I have on the machine, but runs calculations only on one of them:

For the run:

python Auto3D_pkg/auto3D.py tests/input_corrected.smi --k=5 --enumerate_tautomer false --enumerate_isomer false --capacity 1

There is the following output in the log:

The available memory is 32 GB.
The task will be divided into 7 jobs.
Job1, number of inputs: 30
Job2, number of inputs: 30
Job3, number of inputs: 29
Job4, number of inputs: 29
Job5, number of inputs: 29
Job6, number of inputs: 29
Job7, number of inputs: 29

And nvidia-smil output (see processes cocoa_env/bin/python):

image

So, it only considers one GPU, but blocks 5. Is this is an intendet behavior? Is the code able to parallelize over multiple GPUs? I tried turning --capacity option, but seems to be the same result.

isayev commented 12 months ago

Dear @OlgaGKononova thanks for reporting. This is not expected behavior:) We will look into it.

OlgaGKononova commented 12 months ago

Dear @isayev thank you for the reply. Do you have an estimate, how long it may take from your side to fix it. We would like to plan if it is worth for us waiting for the fixes or proceed with the current version.

Also, if I had amount of smiles sufficient enough to fit one GPU, would it automatically occupy other GPU with the rest of the smiles or it will still wait for the one used GPU to finish? In other words, is the code able to parallelize over multiple GPUs?

Thank you.

LiuCMU commented 12 months ago

Dear @OlgaGKononova,

Thank you for your follow-up.

I would recommend proceeding with the latest version (2.1.0). I was unable to reproduce this issue on our local cluster with 3 GPUs, so it may be a challenging problem related to hardware. It will likely take some time to pinpoint the exact cause.

Based on your screenshot, it appears that no memory or computing resources were consumed for GPU2, 3, and 4. Therefore, this issue probably won't affect any existing processes on those GPUs.

Currently, Auto3D does not support parallelization across multiple GPUs. It utilizes a single GPU and divides the input SMI files into smaller jobs, running each one concurrently. You can specify the GPU to be used using the gpu_idx argument:

python Auto3D_pkg/auto3D.py tests/input_corrected.smi --k=5  --enumerate_isomer false --capacity 1 --gpu_idx=4

In the above case, GPU at index 4 will be used. This code does the same job as your previous code, except using GPU at index 4. By default, tautomers are not enumerated, so I deleted --enumerate_tautomer false for sanity.

LiuCMU commented 11 months ago

Thank you for the update!

That is surprising, since the value of gpu_idx directly goes to torch.device("cuda: {gpu_idx}"). To check if it's Auto3D issue or related to hardware, could you try to run any general PyTorch script and see if you could control which GPU to be used?

One workaround it to append the environment variable of CUDA_VISIBLE_DEVICES before your command: CUDA_VISIBLE_DEVICES="your_gpu_idx" python Auto3D_pkg/auto3D.py tests/input_corrected.smi --k=5 --enumerate_isomer false

On Thu, Oct 5, 2023 at 5:10 AM Olga Kononova @.***> wrote:

UPD: I also found that somehow --gpu_idx flag is ignored: no matter that I put in there, the computations run gpu_idx=0

— Reply to this email directly, view it on GitHub https://github.com/isayevlab/Auto3D_pkg/issues/48#issuecomment-1748446547, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOK6RLOGPKUE4IDNTISNKATX5Z2P5AVCNFSM6AAAAAA4XVNWK6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONBYGQ2DMNJUG4 . You are receiving this because you commented.Message ID: @.***>

--

"Where there is charity and wisdom, there is neither fear nor ignorance."

Zhen (Jack) Liu Ph.D. candidate at the Isayev Lab Department of Chemistry, Carnegie Mellon University 4400 Fifth Avenue Pittsburgh, PA 15213 USA

LiuCMU commented 9 months ago

Hello @OlgaGKononova , auto3d now supports running jobs with multiple GPUs. To use multiple GPUs, you just need to parse the gpu indexes to the gpu_idx parameter as a comma seperated string. For example --gpu_idx=0,1,2 will use GPU at indexes 0, 1 and 2.

LiuCMU commented 8 months ago

I will close it for now. Please let us know if you have additional questions.