Open kkothari93 opened 3 years ago
Yeah, I'm likewise not seeing the memory requirements scaling down with lower batch sizes on some experiments. I run out of memory with batch size 1 on train_img.py (I have a 6GB GPU).
Batch size memory scaling does work for train_sdf.py (point cloud). I'm able to get that <6GB with a batch size of 100000.
Same here, I have also tried to reduce the batch size in train_inverse_helmholtz.py to no avail. Also running a 32 GB GPU and also getting a CUDA run out of memory error.
Hi, I'm getting exactly the same problem. I tried using the python garbage collector (ie. gc.collect()
) and torch.cuda.empty_cache()
but it still crashes with OOM. @vsitzmann any suggestions?
From the
experiment_scripts/
folder, I try to runtrain_inverse_helmholtz.py
experiment as follows.python3 train_inverse_helmholtz.py --experiment_name fwi --batch_size 1
The supplementary section 5.3 of the paper states that a single 24GB GPU was used for running this experiment whereas I am using a 32GB V100 which should be sufficient. However, even with a batch size of 1 I get the following error:
RuntimeError: CUDA out of memory. Tried to allocate 14.00 MiB (GPU 0; 31.75 GiB total capacity; 28.32 GiB already allocated; 11.75 MiB free; 30.49 GiB reserved in total by PyTorch)
Here is the full trace:
Can you please help?