nv-tlabs / GET3D

Other
4.2k stars 376 forks source link

CUDA out of memory #114

Closed chenglong0313 closed 1 year ago

chenglong0313 commented 1 year ago

I use an 8 * V100 gpu with 32G memory. Train with default parameters directly will report an OOM error. I reduce the batchsize to 8, and still report this error. I can only train when using a single card, and the gpu memory of a single card takes up about 8G. The environment I use is installed according to the requirements of Git. I I run on a slurm cluster and I set dataloader workers to 1. after starting the training, the memory of the zero card has increased a lot. Where is the problem please?

env: python=3.8.13 pytorch=1.9.0 kaolin==0.14.0a0 cuda=11.1

RuntimeError: CUDA out of memory. Tried to allocate 1.07 GiB (GPU 2; 31.75 GiB total capacity; 655.12 MiB already allocated; 1.04 GiB free; 1.72 GiB reserved in total by PyTorch)

SteveJunGao commented 1 year ago

Hmm, this looks weird to me, I can always train the model on 16GB V100 gpus.

Have you modified any part of the code? (I know some parts of the code can increase the memory a lot if you did some modifications)

SteveJunGao commented 1 year ago

Closing this issue as haven't heard back for three months, please feel free to reopen it if you still find the problem!