[BUG] Default prompt for similarity does not work with gpt-4o

pamelafox commented 2 months ago

Describe the bug

Using code like this:

from promptflow.evals.evaluators import SimilarityEvaluator

similarity = SimilarityEvaluator(model_config)

similarity_score = similarity(
    question=sample['question'],
    answer=sample['final_answer'],
    ground_truth=sample['gold_final_answer'],
)

We get a result of nan for the score, and when we print out the llm_output from inside the promptflow code, we simply see an output of the word "Based".

It would be great if the prompts could be improved to work with gpt-4o as evaluation models, since many developers would like to use them instead of gpt-4.

How To Reproduce the bug Steps to reproduce the behavior, how frequent can you experience the bug:

We always experience it for this data, on every question.

Expected behavior

We expected the prompts to work well for gpt-4o as well.

Screenshots If applicable, add screenshots to help explain your problem.

Running Information(please complete the following information):

Promptflow Package Version using pf -v: [e.g. 0.0.102309906] { "promptflow": null, "promptflow-azure": "1.14.0", "promptflow-core": "1.14.0", "promptflow-devkit": "1.14.0", "promptflow-evals": "0.3.1", "promptflow-tracing": "1.14.0" } Operating System: GitHub Codespaces, python3:12 image Python Version using python --version: Python 3.12.5

pamelafox commented 2 months ago

cc @cedricvidal

pamelafox commented 2 months ago

I just tried this with 0.3.2, and I am getting a numeric score instead of "Based" (nan). So something seems to have changed in last version to improve this, at least for this particular QA pair. @cedricvidal could try running on the full set to see theyre all working now.

github-actions[bot] commented 1 month ago

Hi, we're sending this friendly reminder because we haven't heard back from you in 30 days. We need more information about this issue to help address it. Please be sure to give us your input. If we don't hear back from you within 7 days of this comment, the issue will be automatically closed. Thank you!

microsoft / promptflow

[BUG] Default prompt for similarity does not work with gpt-4o #3648