Facing an issue while tuning LLAMA-2-7b-chat on which I request some suggestions.
I use a specific system prompt that defines some keys, and then provide an instruction and ask the model to generate a JSON output with these keys. I am using 7b-chat model. Even with 5 examples, the output is fine.
When I take 1000 such examples and use PEFT-QLoRA to tune it (each sample consists of system prompt, instruction and output in LLAMA-2 prompt structure), I do not get proper results.
What could be the issue here?
Is it correct to use System Prompt, Instruction and Output in LLAMA-2 prompt structure (
f"<s> [INST] <<SYS>>\n{sys_prompt}\n<</SYS>>\n\n{instruction} [/INST] {output} </s>"
)? Or should I be using something else?
For this exercise, should 7b-chat be used or 7b?
Could quantization be leading to a issue here? Why would I not get the expected output even with tuning the model with 1000 examples?
Facing an issue while tuning LLAMA-2-7b-chat on which I request some suggestions.
What could be the issue here?
f"<s> [INST] <<SYS>>\n{sys_prompt}\n<</SYS>>\n\n{instruction} [/INST] {output} </s>"
)? Or should I be using something else?Thanks in advance.