Training process for multi-GPUs

yang-song / score_sde_pytorch

PyTorch implementation for Score-Based Generative Modeling through Stochastic Differential Equations (ICLR 2021, Oral)

https://arxiv.org/abs/2011.13456

Apache License 2.0

1.58k stars 295 forks source link

Training process for multi-GPUs #38

Open Jaehoon-zx opened 1 year ago

Jaehoon-zx commented 1 year ago

Hi, I am trying to run training/evaluation with 4 A100s. However, after some experiments I noticed that the training speed was same compared with process trained with a single GPU. Am I missing something?

mo666666 commented 1 year ago

Hello, Jaehoon. I encounter the same problem. I conjecture this is because your Tensorflow package is not installed correctly. I recommend that you should follow the tips provided by https://www.tensorflow.org/install/pip step by step. This maybe helps you solve the problem.

mo666666 commented 1 year ago

However, after solving the above issue, also as a 4*A100 user, I meet the CUDA out of the Memory issue. Do you encounter this issue for the code in this repository?

Jaehoon-zx commented 1 year ago

Take a look at this. https://github.com/yang-song/score_sde_pytorch/issues/14#issuecomment-1075887846 I solved CUDA memory issue by adding it to main.py.

mo666666 commented 1 year ago

Ok, thank you very much!

mo666666 commented 1 year ago

Hi, Jaehoon! Does your training speed on 4A100 improve? After re-checking my experiment, I found it is still quite slow: for each GPU, the utilization rate is around 50%. Do you found another trick to accelerate the training speed or can the author @yang-song provide some advice?