huggingface / peft

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
https://huggingface.co/docs/peft
Apache License 2.0
16.57k stars 1.64k forks source link

PEFT with quantized LLMa #1081

Closed Sai-Ashish closed 11 months ago

Sai-Ashish commented 1 year ago

Feature request

I would like to inquire if prompt tuning feature is available with 4-bit quantized LLAMA-2 model.

Motivation

4-bit quantization makes the models easy to use and hardware friendly. Can they be used along with prompt tuning using this library?

Your contribution

I believe it should be compatible since full precision LLAMA-2 (CausalLM) is compatible with prompt-tuning.

SumanthRH commented 1 year ago

Not a maintainer, but I will chime in: Prompt tuning with 4-bit quantization should work! Prompt tuning simply modifies the input embeddings into the transformer, so all appropriate dtype casting necessary should happen. If you're interested, you can have a look at the exact changes for a causal LM here. Locally, I'm able to run the example notebook for causal language modelling with 4-bit quantization enabled (just added load_in_4bit=True at init and model = prepare_model_for_kbit_training(model) before calling get_peft_model). The notebook uses BLOOMz-560M, but Llama-2 should also work.

Sai-Ashish commented 1 year ago

Thanks a lot for an elaborate answer @SumanthRH !

github-actions[bot] commented 12 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.