Open marco-c opened 3 months ago
Using the accepted examples (+ human comments? + synthetic comments from bug fixes?) as input for the finetuning. Use preference optimization to also learn from the rejected examples? We should find out what is the cost of hosting the finetuned models.
Past review comments are sometimes very short and might lack enough context to be good examples for finetuning a LLM.
We could prompt GPT-4 with the patch and the provided comment, asking it to explain how it got there, and then finetune a smaller LLM to learn what GPT-4 did.
For example, I prompted GPT-3.5 with:
And it replied with:
It isn't perfect, but it was just a quick test. WIth GPT-4 and with more structure we could have something better.