uptrain-ai / uptrain

UpTrain is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them.
https://uptrain.ai/
Apache License 2.0
2.2k stars 191 forks source link

Custom Prompts -Example clarification #711

Open terry07 opened 5 months ago

terry07 commented 5 months ago

Describe the bug I am running the example as it is in the next page: https://docs.uptrain.ai/predefined-evaluations/custom-evals/custom-prompt-eval .

To Reproduce I use Python 3.10.14 and Uptrain 0.7.1 running Ubuntu 22.04.4. LTS through WSL 2.1.5.0. The only change that I do, is to use Azure OpenAI credentials, that do not seem to infer any issue, since I use your library for the other metrics that you have implemented.

Expected behavior To get a .json with the mentioned fields: Choice, Explanation and score_custom_prompt

Screenshots

2024-05-28 15:22:29.858 | WARNING | uptrain.operators.language.llm:fetch_responses:268 - Detected a running event loop, scheduling requests in a separate thread. 0%| | 0/1 [00:00<?, ?it/s]2024-05-28 15:22:30.299 | ERROR | uptrain.operators.language.llm:async_process_payload:103 - Error when sending request to LLM API: Error code: 400 - {'error': {'message': "Invalid parameter: 'response_format' of type 'json_object' is not supported with this model.", 'type': 'invalid_request_error', 'param': 'response_format', 'code': None}} 100%|██████████| 1/1 [00:00<00:00, 2.28it/s] /home/stkarlos/miniconda3/envs/test-uptrain/lib/python3.10/site-packages/uptrain/operators/language/llm.py:271: RuntimeWarning: coroutine 'LLMMulticlient.async_fetch_responses' was never awaited with ThreadPoolExecutor(max_workers=1) as executor: RuntimeWarning: Enable tracemalloc to get the object allocation traceback 2024-05-28 15:22:30.310 | ERROR | uptrain.operators.language.custom:evaluate_local:144 - Error when processing payload at index 0: Error code: 400 - {'error': {'message': "Invalid parameter: 'response_format' of type 'json_object' is not supported with this model.", 'type': 'invalid_request_error', 'param': 'response_format', 'code': None}} 2024-05-28 15:22:30.363 | INFO | uptrain.framework.evalllm:evaluate:376 - Local server not running, start the server to log data and visualize in the dashboard!

Screenshot 2024-05-28 153928

Additional context

When I remove the response_format option into the settings object, I get a 0.5 score_custom_prompt, as the example states, but I do not get back the "Explanation", neither the "Choice" (although this is trivial using a mapping).

Thanks for your time.

deni-topalova commented 2 months ago

We also face the same issues when we try to build custom evaluations with the CustomPromptEval. Sometimes we get an error: "2024-08-29 19:10:21.296 | ERROR | uptrain.operators.language.custom:evaluate_local:144 - Error when processing payload at index 0: None" and from what I saw from the debugging it's because the format of the response that it tries to parse '```json\n{\n "Choice": "Correct",\n "Explanation": "...."\n}\n```' is not the expected one.

Apart from that, we would also like to be able to get the "Explanation" and not only the score. Thank you!

deni-topalova commented 2 months ago

Hey @Dominastorm! I am tagging you since I saw that you've implemented the CustomPromptEval evaluation :) Can you please have a look at this issue whenever you have some time because it is causing a lot of failures on our side? If your team doesn't have the capacity to fix this, we can try to contribute with a fix, just let us know. Thank you!

Dominastorm commented 2 months ago

Hey @deni-topalova! Thank you for reaching out and bringing this to our attention. Unfortunately, this issue is beyond our current capacity to address. However, we would greatly appreciate any contributions from your side. If you're able to create a fix, please feel free to submit a PR and tag me or @ashish-1600. We will review it as soon as possible.