Open Jaehoon-zx opened 1 year ago
Hello, Jaehoon. I encounter the same problem. I conjecture this is because your Tensorflow package is not installed correctly. I recommend that you should follow the tips provided by https://www.tensorflow.org/install/pip step by step. This maybe helps you solve the problem.
However, after solving the above issue, also as a 4*A100 user, I meet the CUDA out of the Memory issue. Do you encounter this issue for the code in this repository?
Take a look at this. https://github.com/yang-song/score_sde_pytorch/issues/14#issuecomment-1075887846 I solved CUDA memory issue by adding it to main.py.
Ok, thank you very much!
Hi, Jaehoon! Does your training speed on 4A100 improve? After re-checking my experiment, I found it is still quite slow: for each GPU, the utilization rate is around 50%. Do you found another trick to accelerate the training speed or can the author @yang-song provide some advice?
Hi, I am trying to run training/evaluation with 4 A100s. However, after some experiments I noticed that the training speed was same compared with process trained with a single GPU. Am I missing something?