Closed timothylimyl closed 12 months ago
No idea how well the full fine-tuned models work. They could perform better. You can test on academic benchmarks using lm-eval-harness to get some idea; however, what we really need is a way to test conversational ability in a chatbot.
Training on the latest version of this dataset results in a very capable alpaca model. I'm seeing much better results than with the original alpaca dataset.
It seems that the evaluation comparison made was all under the training scheme of LoRA. Any ideas on full fine-tuning versus the LoRA approach?