Open lstein opened 1 week ago
Thanks for reporting, I can replicate the issue as you described. Some further tests that I did:
AutoModelForCausalLM.from_pretrained("facebook/opt-125m")
, memory is also not freed, whether with our without bnbThe last point made me wonder if the measurement is somehow incorrect. Adding some sleep time made no difference though.
System Info
Information
Tasks
no_trainer
script in theexamples
folder of thetransformers
repo (such asrun_no_trainer_glue.py
)Reproduction
I am a part of the InvokeAI development team (www.invoke.ai), and trying to provide support for the Stable Diffusion 3 text2image model. This task requires me to be able to sequentially load and unload portions of the generation pipeline into VRAM on CUDA systems.
After quantizing the HuggingFace model
T5EncoderModel
usingload_in_8bit
I cannot remove the model from VRAM. This appears to be related to an issue reported at https://github.com/huggingface/transformers/issues/21094. However, none of the proposed solutions are working for me. The following script illustrates the issue:Expected behavior
The output is:
The expected output is for the last line to read
VRAM usage=0
. In fact, when I comment out thequantization_config
parameter, the VRAM is indeed released.