Open calvdee opened 2 days ago
On second read, it looks like the evaluation steps are produced via the completions API which is now considered legacy by OpenAI. Using the completions API, I am still unable to reproduce the coherence evaluation steps outlined in this prompt using ether of the GPT 3.5 models:
G-Eval includes "Auto Chain-of-Thoughts for NLG Evaluation" as a component where the CoT steps to carry out evaluation are produced by an LLM. The paper nor this repo, however, include the prompt definition. It would be convenient to have it available for both reproducibility and to extend G-Eval to other criteria.