Closed tomekrut closed 1 month ago
I can reproduce the error. My suspicion is that it's somehow related to the Trainer
class. When I wrote a vanilla PyTorch training loop, I saw the same memory consumption with and without wrapping.
I also tried passing the non-PEFT model directly to SFTTrainer
and also passing peft_config=lora_config
, as SFTTrainer
knows how to deal with PEFT, but this made little difference.
Another thing I tried is to first create the PEFT model, and then wrap it with ModelWrap
. This reduces memory consumption.
As I'm not an expert when it comes to (SFT)Trainer
, it's hard for me to tell what goes wrong. There are a lot of isinstance
checks within the code, so I wonder if there could be a relation there.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
System Info
Hi guys, I have some complex models where I use just part of sub-models of transformers e.g. Below I used
AutoModelForCausalLM.from_pretrained()
but normally it would be something likeLlamaModel.from_pretrained()
Inside the ModelWrap() there is plenty of stuff but I just wanted to simplify it. Whether I use the SFT trainer or my own.. the GPU memory utilization explodes and I always end up with GPU OOM. Without the wrapper I consume on 8B model (8 bit) 32GB. Once I wrap the model 80GB is not enough. I have A100 80GB.
Can you please comment on that? What I am doing wrong etc...
Who can help?
No response
Information
Tasks
examples
folderReproduction
Script added
Expected behavior
It should still work within 32GB memory threshold.