GPU Memory and Training Problem with Batch Size 2

fransisca25 commented 1 year ago

Hi, I have a problem training the model in batch size 2. I keep getting this error: RuntimeError: CUDA out of memory. Tried to allocate 488.00 MiB (GPU 0; 9.77 GiB total capacity; 6.90 GiB already allocated; 304.38 MiB free; 6.96 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

My GPU is GeForce RTX 3080 10GB and I think I cannot train the model due to lack of GPU memory. I can train the model by cutting the batch size to 1 but I am afraid that it might affect the training result. Do you have any advice about how to fix this problem without changing my GPU?

hulianyuyy commented 1 year ago

You can try the graident accumulation trick here to expand the batch size.

fransisca25 commented 1 year ago

Okay I will look for it. Thank you for answering my question!

hulianyuyy / CorrNet

GPU Memory and Training Problem with Batch Size 2 #6