unslothai / unsloth

Finetune Llama 3, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory
https://unsloth.ai
Apache License 2.0
12.05k stars 779 forks source link

Using unsloth mode gradient checkpointing without LoRA #644

Open Robinysh opened 2 weeks ago

Robinysh commented 2 weeks ago

Currently unsloth offers a customized version of gradient checkpointing that claims to be better. The only way I'm aware of using it is with the below code.

model = FastLanguageModel.get_peft_model(
    model,
    use_gradient_checkpointing = "unsloth", # <<<<<<<
)

But using FastLanguageModel.get_peft_model will patch the model with LoRA. Is there any way to use the unsloth customized gradient checkpointing without LoRA? Or does it even make sense to use it without? Are the customized tricks specific to pefts?

danielhanchen commented 1 week ago

We'll be adding all model support in a future release which will enable Unsloth GC for other models! Unsure on normal full finetuning or pretraining - I would suggest using Deepspeed to offload stuff, and not Unsloth

Robinysh commented 1 week ago

Great to know its on the todo list. I'm not looking for offloading techniques as the performance drop is quite significant, I'm rather trying to do gradient checkpointing during pretraining. The pytorch implementation should be good enough for the time being.