microsoft / promptflow

Build high-quality LLM apps - from prototyping, testing to production deployment and monitoring.
https://microsoft.github.io/promptflow/
MIT License
9.39k stars 855 forks source link

[BUG] Evaluate on test dataset using evaluate() with SimilarityEvaluator returns NaN #3381

Closed bhonris closed 2 months ago

bhonris commented 4 months ago

Describe the bug When running the evaluation a dataset using evaluate() using the similarity evaluator I have come across some scenarios where the result is not a number. How To Reproduce the bug Model config {azure_deployment= "gpt4-turbo-preview", api_version="2024-02-01"} jsonl file {"Question":"How can you get the version of the Kubernetes cluster?","Answer":"{\"code\": \"kubectl version\" }","output":"{code: kubectl version --output=json}"} Evaluate Config

result = evaluate(
    data="testdata2.jsonl",
    evaluators={
        "similarity": SimilarityEvaluator(model_config)
    },
    evaluator_config={
        "default": {
            "question": "${data.Question}",
            "answer": "${data.output}",
            "ground_truth": "${data.Answer}"
        }
    }
)

Expected behavior Value returned is number

Running Information(please complete the following information):

Additional context

bhonris commented 4 months ago

I have added to similarity.prompty the following text: "You will respond with a single digit number between 1 and 5. You will include no other text or information", and this seems to fix the issue.

brynn-code commented 4 months ago

Hi @singankit and @luigiw , could you please help take a look at this issue?

luigiw commented 4 months ago

@bhonris , thank you for reporting the issue and sharing a workaround. It is a known issue that some preview OpenAI models will cause NaN results. Please also try with stable version models.

github-actions[bot] commented 3 months ago

Hi, we're sending this friendly reminder because we haven't heard back from you in 30 days. We need more information about this issue to help address it. Please be sure to give us your input. If we don't hear back from you within 7 days of this comment, the issue will be automatically closed. Thank you!

luigiw commented 2 months ago

Fixed in 0.3.2 version.