Open manogna-s opened 9 months ago
Thanks for your interest in our work!
To optimize GPU memory footprint, you can follow this thread and try:
optimizer.zero_grad(set_to_none=True)
Another workaround is to decrease input.batch_size
and increase tta.gradient_descent.accum_iter
to maintain the same total batch size. For example: change input.batch_size=15
and tta.gradient_descent.accum_iter=12
to input.batch_size=12
and tta.gradient_descent.accum_iter=15
. However, since the initialization of random Gaussian noise changes with input.batch_size
, it is not guaranteed to reproduce the same number with different input.batch_size
.
I am also facing the same issue. Could you tell me about the hardware you were using? Here is what my nvidia-smi
looks like:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.223.02 Driver Version: 470.223.02 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:B3:00.0 Off | N/A |
| 0% 37C P8 15W / 350W | 17MiB / 24267MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1316 G /usr/lib/xorg/Xorg 9MiB |
| 0 N/A N/A 1531 G /usr/bin/gnome-shell 6MiB |
+-----------------------------------------------------------------------------+
Hello, our experiments are conducted on a Nvidia V100/A100/RTX6000 (32/40/48 GB) graphic card.
Thanks to the authors for providing the code. This is a really interesting work!
I am trying to run this code on an NVIDIA RTX A5000. I observed that the initial GPU memory(0th step) allocated before the gradients is 7GB. The GPU memory remains at 19GB until 14 when the gradients are accumulated. Once, it enters the below if condition, on performing optimizer.zero_grad(), I expected the memory to go down to 7GB. However, the allocated GPU memory goes down only to 13GB and on the 15th step, it runs out of memory(requiring more than 24GB). I localised the issue to the following block (line 128 in
https://github.com/mihirp1998/Diffusion-TTA/blob/main/diff_tta/engine.py
). Any suggestions on resolving this issue would be great:)