Open andrewor14 opened 2 months ago
I had problems fine-tuning Llama3.1 with torchtune too (i.e. fine-tuned model performs worse than original). I think one problem is that the Llama3 recipes in torchtune are using the instruct version, which can be difficult to fine-tune.
The exact same workflow used to work in late June (can't find the exact commit now) but seems to be broken in recent commits (8/13/24).
Do you rmb if you used Llama2 or Llama3? Fine-tuning base Llama2 (non-instruct version) is fine for me.
Do you rmb if you used Llama2 or Llama3?
This was Llama3-8B
I'm fine-tuning Llama3-8B on the C4 dataset (en subset) for 2000 steps using the
full_finetune_distributed
recipe. I find that the loss did not go down at all and the quantized accuracy is very low. The exact same workflow used to work in late June (can't find the exact commit now) but seems to be broken in recent commits (8/13/24).Eval quantized fine-tuned checkpoint:
Eval quantized original checkpoint (no fine-tuning):
Some relevant configs: