[BUG]evaluator keeps failing with promptflow-eval 0.3.1 but works with 0.3.0

yanggaome commented 2 months ago

Describe the bug

After switching promptflow-eval from 0.3.0 to 0.3.1, for same code, it keeps giving me the error:

2024-07-15 06:52:21 +0000     377 execution.bulk     ERROR    Error occurred while executing batch run. Exception: The input for batch run is incorrect. Input from key 'data' is an empty list, which means we cannot generate a single line input for the flow run. Please rectify the input and try again.

======= Run Summary =======

Run name: "promptflow_evals_evaluators_content_safety_content_safety_contentsafetyevaluator_k3ldgeeq_20240715_065220_068989"
Run status: "Failed"
Start time: "2024-07-15 06:52:20.066987+00:00"
Duration: "0:00:01.488330"
Output path: "/root/.promptflow/.runs/promptflow_evals_evaluators_content_safety_content_safety_contentsafetyevaluator_k3ldgeeq_20240715_065220_068989"

Traceback (most recent call last):
...
    _ = evaluate(
  File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/evals/evaluate/_telemetry/__init__.py", line 111, in wrapper
    result = func(*args, **kwargs)
  File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/evals/evaluate/_evaluate.py", line 365, in evaluate
    raise e
  File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/evals/evaluate/_evaluate.py", line 340, in evaluate
    return _evaluate(
  File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/evals/evaluate/_evaluate.py", line 447, in _evaluate
    evaluator_info["result"] = batch_run_client.get_details(evaluator_info["run"], all_results=True)
  File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/evals/evaluate/_batch_run_client/proxy_client.py", line 33, in get_details
    run = proxy_run.run.result(timeout=BATCH_RUN_TIMEOUT)
  File "/usr/local/miniconda/lib/python3.9/concurrent/futures/_base.py", line 446, in result
    return self.__get_result()
  File "/usr/local/miniconda/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
    raise self._exception
  File "/usr/local/miniconda/lib/python3.9/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/_sdk/_pf_client.py", line 301, in run
    return self._run(
  File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/_sdk/_pf_client.py", line 226, in _run
    return self.runs.create_or_update(run=run, **kwargs)
  File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/_sdk/_telemetry/activity.py", line 265, in wrapper
    return f(self, *args, **kwargs)
  File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/_sdk/operations/_run_operations.py", line 137, in create_or_update
    self.stream(created_run)
  File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/_sdk/_telemetry/activity.py", line 265, in wrapper
    return f(self, *args, **kwargs)
  File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/_sdk/operations/_run_operations.py", line 228, in stream
    raise InvalidRunStatusError(error_message)
promptflow._sdk._errors.InvalidRunStatusError: First error message is: The input for batch run is incorrect. Input from key 'data' is an empty list, which means we cannot generate a single line input for the flow run. Please rectify the input and try again.

I noticed some changes related to performance/parallelization between 0.3.0 and 0.3.1. e.g. https://github.com/microsoft/promptflow/commit/32115d397b888be86eaea9b54039f5dd20f55dc0

for the evaluate API, i see it changed from

    use_thread_pool = kwargs.get("_use_thread_pool", True)
    batch_run_client = CodeClient() if use_thread_pool else pf_client

to

    use_pf_client = kwargs.get("_use_pf_client", True)
    batch_run_client = ProxyClient(pf_client) if use_pf_client else CodeClient()

so accordingly, the batch_run_client changed from a CodeClient to ProxyClient with default inputs (neither passing in _use_thread_pool nor _use_pf_client)

Is this related to the failure? and which client is parallelization and recommended to use?

How To Reproduce the bug Steps to reproduce the behavior, how frequent can you experience the bug: 1.

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Running Information(please complete the following information):

Promptflow Package Version using pf -v: [e.g. 0.0.102309906]
Operating System: [e.g. Ubuntu 20.04, Windows 11]
Python Version using python --version: [e.g. python==3.10.12]

{ "promptflow": "1.13.0", "promptflow-azure": "1.13.0", "promptflow-core": "1.13.0", "promptflow-devkit": "1.13.0", "promptflow-evals": "0.3.1", or 0.3.0 as comparision "promptflow-tracing": "1.13.0" }

Additional context Add any other context about the problem here.

yanggaome commented 2 months ago

one observation is that, when I created the target user call (with a class same with the evaluator, provided init and call methods)

in the constructor, if I passed the loaded data (~100 lines jsonl data) as argument explicitly: TargetCallClass(config, input_data), then it got this error

~if i passed this loaded data in a config.input_data: TargetCallClass(config), it seems better, better means can succeeded on one machine, but still failed on another. not sure if this is related, but just giving this observation~

do you have any idea about this?

also, if i set _use_pf_client to false with 0.3.1 evals, it worked fine (just as 0.3.0 as they both use CodeClient not ProxyClient)

yanggaome commented 2 months ago

Hi @ninghu , I think i was able to identify the root cause.

The issue was when the evaluate api gets called, the data path should be absolute path, not relative path w.r.t. working directory.

    _ = evaluate(
        evaluation_name=evaluation_name,
        data=data_path,
        target=custom_user_call,
        evaluators=evaluators_set,
        evaluator_config=evaluators_config,

        )

When using CodeClient, the dataframe read from data_path is used. When using ProxyClient, data (data path) is used for pfclient run. However, this still puzzles me, because for target call, it is also a pf client run, with data to be relative path, it still works well. but it makes a difference when using ProxyClient.

Another observation is, if we use one evaluator (either built-in or customized), even with ProxyClient + relative path, it still worked.

But when using two evaluators, ProxyClient + relative path didn't work. We have to use absolute path.

ninghu commented 2 months ago

These are great findings! The behavior you described is interesting and seems to occur only when using more than one evaluator. We'll try to reproduce this issue and get back to you.

github-actions[bot] commented 1 month ago

Hi, we're sending this friendly reminder because we haven't heard back from you in 30 days. We need more information about this issue to help address it. Please be sure to give us your input. If we don't hear back from you within 7 days of this comment, the issue will be automatically closed. Thank you!

microsoft / promptflow

[BUG]evaluator keeps failing with promptflow-eval 0.3.1 but works with 0.3.0 #3549