Closed Sai-Ashish closed 11 months ago
Not a maintainer, but I will chime in: Prompt tuning with 4-bit quantization should work! Prompt tuning simply modifies the input embeddings into the transformer, so all appropriate dtype casting necessary should happen. If you're interested, you can have a look at the exact changes for a causal LM here. Locally, I'm able to run the example notebook for causal language modelling with 4-bit quantization enabled (just added load_in_4bit=True
at init and model = prepare_model_for_kbit_training(model)
before calling get_peft_model
). The notebook uses BLOOMz-560M, but Llama-2 should also work.
Thanks a lot for an elaborate answer @SumanthRH !
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Feature request
I would like to inquire if prompt tuning feature is available with 4-bit quantized LLAMA-2 model.
Motivation
4-bit quantization makes the models easy to use and hardware friendly. Can they be used along with prompt tuning using this library?
Your contribution
I believe it should be compatible since full precision LLAMA-2 (CausalLM) is compatible with prompt-tuning.