Gradient checkpointing issue when running QLoRA finetuning

mosaicml / llm-foundry

LLM training code for Databricks foundation models

https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm

Apache License 2.0

4.06k stars 530 forks source link

Gradient checkpointing issue when running QLoRA finetuning #413

Open tytung2020 opened 1 year ago

tytung2020 commented 1 year ago

Finetuning the mpt-7b and mpt-30b using qlora gives the error "ValueError: MPTForCausalLM does not support gradient checkpointing.". Is there a way to fix this?

tytung2020 commented 1 year ago

are these lines of codes what is needed to make it work? cekal's amendment seems to work on the 7b version: https://huggingface.co/cekal/mpt-7b-peft-compatible/commit/a5eab52c1c61c1d50a4e01428949f6ff90c73c48 But not sure if it works fully as intended. Could someone in MosaicML check this? If so, please also implement this in the 30b version. Thanks~