Open cvKDean opened 5 years ago
Hi,
I have the same issue. After the first epoch I get: RuntimeError: CUDA out of memory. Tried to allocate 40.00 MiB (GPU 0; 8.00 GiB total capacity; 6.43 GiB already allocated; 0 bytes free; 6.53 GiB reserved in total by PyTorch)
I am running the mapping challenge dataset.
I have experimented with varying batch sizes and also number workers, but the problem occurs no matter the settings.
Update: Significantly reducing the batch size has solved that issue for me. (From 20 to 8).
Good day, I would just like to ask if you guys have any idea why I am running into CUDA memory errors when running training? This happens at the end of the first epoch (epoch 0). For reference, I am just trying to reproduce the results in
REPRODUCE_RESULTS.md
with the smaller dataset withannotation-small.json
.My configuration is: OS: Windows 10 (Anaconda Prompt) GPU: GeForce GTX 1070Ti (single) torch version: 1.0.1
The error stack is as follows: Error stack:
Lowering the batch size from the default 20 to 10 decreased the memory usage of the GPU from ~6GB to ~4GB, and at the end of epoch 0, increased the memory usage to ~6GB. Afterwards, subsequent epochs have continued to run in training at memory usage of ~6GB.
Is this behavior to be expected/normal? I read somewhere that you also used GTX 1070 GPUs for training, and so I thought I would be able to run training at the default batch size. Also, is it normal for GPU memory usage to increase between epochs 0 and 1? Thank you!