Memory leak when using CLIPTextModel

minsuk00 commented 2 months ago

System Info

transformers version: 4.26.1
Platform: Linux-5.15.0-105-generic-x86_64-with-glibc2.35
Python version: 3.11.5
Huggingface_hub version: 0.17.3
PyTorch version (GPU?): 2.1.2 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: yes
Using distributed or parallel set-up in script?: yes

Who can help?

@ArthurZucker
@younesbelkada

Information

[ ] The official example scripts
[X] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[X] My own task or dataset (give details below)

Reproduction

I can't free GPU memory after I use CLIPTextModel Also, memory is allocated in another device for some reason

problem should be reproduced by using the following code snippet

from transformers import CLIPTextModel
import torch

clip_text_model = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14").to("cuda:1")
del clip_text_model
torch.cuda.empty_cache()

Expected behavior

CLIPTextModel is placed in "cuda:1", but for some reason, memory gets allocated in "cuda:0" when I call torch.cuda.empty_cache()
Also, memory is still not freed for "cuda:1"
As a result, both "cuda:0" and "cuda:1" have some memory allocated

I've also tried using garbage collection and explicitly moving model to cpu, but they don't work.

younesbelkada commented 2 months ago

Hi @minsuk00 you can also try the release_memory utility method from accelerate.utils - cc @muellerzr

minsuk00 commented 2 months ago

@younesbelkada -cc @muellerz Thanks for the suggestion, but it doesn't seem to work. clip_text_model = accelerate.utils.release_memory(clip_text_model) does not free any GPU memory.

Additionally, calling clip_text_model.cpu() or torch.cuda.empty_cache() simply results in the behavior described above.

amyeroberts commented 1 month ago

cc @muellerzr regarding the accelerate behaviour.

Regarding torch.cuda.empty_cache() it's recommended that this function is not manually used c.f. a related issue, and this discussion in the pytorch forum

huggingface / transformers