huggingface / peft

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
https://huggingface.co/docs/peft
Apache License 2.0
16.57k stars 1.64k forks source link

Can not free GPU memory after Trainer.train() a Peft lora model #2221

Closed Deno-V closed 1 week ago

Deno-V commented 1 week ago

System Info

peft 0.13.2 accelerate 1.1.0 torch 2.4.0 trl 0.12.0 python 3.10.15 linux server

Who can help?

@BenjaminBossan @sayakpaul

Information

Tasks

Reproduction

import torch
import os
from transformers import ( AutoTokenizer, AutoModelForCausalLM,BitsAndBytesConfig,TrainingArguments,)
from trl import SFTTrainer
from datasets import Dataset
from peft import LoraConfig, PeftModel
os.environ["CUDA_VISIBLE_DEVICES"] = "3"
os.environ["WANDB_DISABLED"] = "true"
os.environ["NCCL_P2P_DISABLE"] = "1"
os.environ["NCCL_IB_DISABLE"] = "1"
device_map = {"":0}
print("::::Before Loading Model, allocated GPU memory:",torch.cuda.memory_allocated())
pretrained_path = 'meta-llama/Meta-Llama-3-8B-Instruct'
#################### Prepare Data
tokenizer = AutoTokenizer.from_pretrained(pretrained_path)
tokenizer.pad_token = tokenizer.eos_token
data = {  'input_ids': [  
        tokenizer.encode("Hello, how are you?"),  
        tokenizer.encode("I am fine, thank you!")]}
train_dataset = Dataset.from_dict(data)

###################### Prepare Model
bnb_config = BitsAndBytesConfig(load_in_4bit = True,bnb_4bit_quant_type = "nf4",bnb_4bit_compute_dtype = "float16",bnb_4bit_use_double_quant = False)
peft_config = LoraConfig(lora_alpha = 32,lora_dropout = 0.1,r  = 32,bias = "none",task_type = "CAUSAL_LM")
print('start loading')  
import time;time.sleep(2)
model = AutoModelForCausalLM.from_pretrained(
    pretrained_path,
    quantization_config = bnb_config,
    device_map = device_map,
)
model = PeftModel(model,peft_config)
print("::::After Loading Model, allocated GPU memory:",torch.cuda.memory_allocated())

######################## Train model
training_arguments = TrainingArguments(
    report_to=None,output_dir='.',
    per_device_train_batch_size=1,
    gradient_accumulation_steps=1,
    optim="paged_adamw_32bit",
    save_strategy = 'no',
    logging_steps=1,
    learning_rate=2e-4,
    fp16=True,
    max_steps=2,)
trainer = SFTTrainer(
    model=model,
    train_dataset=train_dataset,
    peft_config=peft_config,
    max_seq_length=2048,
    tokenizer=tokenizer,
    args=training_arguments
)
trainer.train()
print("::::After Training Model, allocated GPU memory:",torch.cuda.memory_allocated())

######################## Try to free GPU
import gc
del model
del trainer
torch.cuda.empty_cache();gc.collect()
torch.cuda.empty_cache();gc.collect()
torch.cuda.empty_cache();gc.collect()
torch.cuda.empty_cache();gc.collect()
torch.cuda.empty_cache();gc.collect()
torch.cuda.empty_cache();gc.collect()
torch.cuda.empty_cache();gc.collect()
torch.cuda.empty_cache();gc.collect()
torch.cuda.empty_cache();gc.collect()
print("::::After Free Memory, allocated GPU memory:",torch.cuda.memory_allocated())

Expected behavior

I expect the code should print the following GPU memory allocation: Before loading model, the allocated memory should be 0 After loading model, the allocated memory should be 6115435008 After training model, the allocated memory should be slightly higher: 6115435008+? After empty_cache() and garbage collecting, the allocated memory should be very low: 0~5000 (maybe?)

However, the code prints the results: ::::Before Loading Model, allocated GPU memory: 0 ::::After Loading Model, allocated GPU memory: 6115435008 ::::After Training Model, allocated GPU memory: 6132475392 ::::After Free Memory, allocated GPU memory: 6132474368

From the results, I see that I can not free the memory after the train()

I have done several tries, if I do not call train(), the GPU memory can be freed normally.

I have to do further procedure after training. But this memory consumption will accumulate. If I call this train() n times. The allocated memory will grow n times. BAD!

How can I free this memory? I even think this is a severe bug.

BenjaminBossan commented 1 week ago

Thanks for reporting and providing a reproducer. I ran it on my machine and these are my logs (omitting fluff):

::::Before Loading Model, allocated GPU memory: 0
::::After Loading Model, allocated GPU memory: 6115435008
{'loss': 2.8368, 'grad_norm': nan, 'learning_rate': 0.0002, 'epoch': 0.5}                                                                                                                                                                                                           
{'loss': 3.83, 'grad_norm': nan, 'learning_rate': 0.0002, 'epoch': 1.0}                                                                                                                                                                                                             
{'train_runtime': 0.5141, 'train_samples_per_second': 3.89, 'train_steps_per_second': 3.89, 'train_loss': 3.3334261178970337, 'epoch': 1.0}                                                                                                                                         
::::After Training Model, allocated GPU memory: 6132475392
::::After Free Memory, allocated GPU memory: 17039360

So for me, there is also a memory left after clearing the cache, but only a little, whereas for you, it's basically the same as before clearing. I'm not sure what's going on here. Could you try updating to the latest versions of PEFT, transformers, trl, accelerate, and torch?

Deno-V commented 1 week ago

thanks, problem solved.

I updated these packages and find it's due to transformers (4.46.1) After I update it to 4.46.3. This problem is solved. (Same results as you got)

I spend half a day on it and never think about it is transformers... hahaha : (

i tried a non-peft model before and not notice this problem. So I wrongly take it as a bug of PEFT. Sorry~

BenjaminBossan commented 1 week ago

I'm glad that this solved the issue for you.

i tried a non-peft model before and not notice this problem. So I wrongly take it as a bug of PEFT. Sorry~

It could be some strange interaction between PEFT and transformers that's causing it. As this is now patched though, I don't think it's worth it to investigate further. I'll close the PR, but if anything new comes up, feel free to re-open.