Closed pamelafox closed 1 month ago
cc @cedricvidal
I just tried this with 0.3.2, and I am getting a numeric score instead of "Based" (nan). So something seems to have changed in last version to improve this, at least for this particular QA pair. @cedricvidal could try running on the full set to see theyre all working now.
Hi, we're sending this friendly reminder because we haven't heard back from you in 30 days. We need more information about this issue to help address it. Please be sure to give us your input. If we don't hear back from you within 7 days of this comment, the issue will be automatically closed. Thank you!
Describe the bug
Using code like this:
We get a result of nan for the score, and when we print out the
llm_output
from inside the promptflow code, we simply see an output of the word "Based".It would be great if the prompts could be improved to work with gpt-4o as evaluation models, since many developers would like to use them instead of gpt-4.
How To Reproduce the bug Steps to reproduce the behavior, how frequent can you experience the bug:
Expected behavior
We expected the prompts to work well for gpt-4o as well.
Screenshots If applicable, add screenshots to help explain your problem.
Running Information(please complete the following information):
Promptflow Package Version using pf -v: [e.g. 0.0.102309906] { "promptflow": null, "promptflow-azure": "1.14.0", "promptflow-core": "1.14.0", "promptflow-devkit": "1.14.0", "promptflow-evals": "0.3.1", "promptflow-tracing": "1.14.0" } Operating System: GitHub Codespaces, python3:12 image Python Version using python --version: Python 3.12.5