meta-llama / llama

Inference code for Llama models
Other
54.33k stars 9.34k forks source link

Incorrect inference after PEFT (QLoRA) #701

Open rsaxena-rajat opened 11 months ago

rsaxena-rajat commented 11 months ago

Facing an issue while tuning LLAMA-2-7b-chat on which I request some suggestions.

  1. I use a specific system prompt that defines some keys, and then provide an instruction and ask the model to generate a JSON output with these keys. I am using 7b-chat model. Even with 5 examples, the output is fine.
  2. When I take 1000 such examples and use PEFT-QLoRA to tune it (each sample consists of system prompt, instruction and output in LLAMA-2 prompt structure), I do not get proper results.

What could be the issue here?

  1. Is it correct to use System Prompt, Instruction and Output in LLAMA-2 prompt structure ( f"<s> [INST] <<SYS>>\n{sys_prompt}\n<</SYS>>\n\n{instruction} [/INST] {output} </s>" )? Or should I be using something else?
  2. For this exercise, should 7b-chat be used or 7b?
  3. Could quantization be leading to a issue here? Why would I not get the expected output even with tuning the model with 1000 examples?

Thanks in advance.

HumzaSami00 commented 10 months ago
<s>[INST] <<SYS>>
{{ system_prompt }}
<</SYS>>

{{ user_message }} [/INST]</s>

This is the prompt format which is used fot chat version of the llama-2 as mentioned here