mihirp1998 / Diffusion-TTA

Diffusion-TTA improves pre-trained discriminative models such as image classifiers or segmentors using pre-trained generative models.
https://diffusion-tta.github.io
Other
49 stars 4 forks source link

Cuda out of memory issue #2

Open manogna-s opened 9 months ago

manogna-s commented 9 months ago

Thanks to the authors for providing the code. This is a really interesting work!

I am trying to run this code on an NVIDIA RTX A5000. I observed that the initial GPU memory(0th step) allocated before the gradients is 7GB. The GPU memory remains at 19GB until 14 when the gradients are accumulated. Once, it enters the below if condition, on performing optimizer.zero_grad(), I expected the memory to go down to 7GB. However, the allocated GPU memory goes down only to 13GB and on the 15th step, it runs out of memory(requiring more than 24GB). I localised the issue to the following block (line 128 in https://github.com/mihirp1998/Diffusion-TTA/blob/main/diff_tta/engine.py). Any suggestions on resolving this issue would be great:)

        scaler.scale(loss).backward()
        if ((step + 1) % config.tta.gradient_descent.accum_iter == 0):
            scaler.step(optimizer)
            optimizer.zero_grad()
            scaler.update()
buttomnutstoast commented 9 months ago

Thanks for your interest in our work!

To optimize GPU memory footprint, you can follow this thread and try:

optimizer.zero_grad(set_to_none=True)

Another workaround is to decrease input.batch_size and increase tta.gradient_descent.accum_iter to maintain the same total batch size. For example: change input.batch_size=15 and tta.gradient_descent.accum_iter=12 to input.batch_size=12 and tta.gradient_descent.accum_iter=15. However, since the initialization of random Gaussian noise changes with input.batch_size, it is not guaranteed to reproduce the same number with different input.batch_size.

goirik-chakrabarty commented 9 months ago

I am also facing the same issue. Could you tell me about the hardware you were using? Here is what my nvidia-smi looks like:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.223.02   Driver Version: 470.223.02   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:B3:00.0 Off |                  N/A |
|  0%   37C    P8    15W / 350W |     17MiB / 24267MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1316      G   /usr/lib/xorg/Xorg                  9MiB |
|    0   N/A  N/A      1531      G   /usr/bin/gnome-shell                6MiB |
+-----------------------------------------------------------------------------+
buttomnutstoast commented 9 months ago

Hello, our experiments are conducted on a Nvidia V100/A100/RTX6000 (32/40/48 GB) graphic card.