Open chandlerbing65nm opened 2 years ago
Hi. Have you managed to resolve this issue? I am currently experiencing the same problem where using multiple GPUs results in each GPU having the same memory usage as when using a single GPU. If you have any solutions or suggestions, could you please share them with me? Thank you very much!
When I tried distributed training for
2 RTX A100 GPU's
withbatch size of 4 images per GPU
, the training time did not decrease.When I change
batch size to 8 images per GPU
, I get this error: