microsoft / promptflow

Build high-quality LLM apps - from prototyping, testing to production deployment and monitoring.
https://microsoft.github.io/promptflow/
MIT License
9.4k stars 856 forks source link

[BUG] promptflow is trying to upload my entire working directory while running with evaluate API #3584

Closed yanggaome closed 1 month ago

yanggaome commented 3 months ago

Describe the bug

I noticed while running promptflow evaluate API, it is trying to upload my entire working directory to remote, and below case it failed because there is a special file which is a soft link it cannot be uploaded. but this file is completely irrelevant to my evaluation run, it is just a file in my working directory.

This happens when I provide a target call as a separate python class

But when the target call just invokes a promptflow (dag yml), it seems not uploading the working directory

Questions: I can definitely remove this special file to make it work, but would like to understand why it needs to upload my entire working directory? how to prevent this?

    results = evaluate(
  File "/anaconda/envs/azureml_py38/lib/python3.9/site-packages/promptflow/evals/evaluate/_telemetry/__init__.py", line 111, in wrapper
    result = func(*args, **kwargs)
  File "/anaconda/envs/azureml_py38/lib/python3.9/site-packages/promptflow/evals/evaluate/_evaluate.py", line 365, in evaluate
    raise e
  File "/anaconda/envs/azureml_py38/lib/python3.9/site-packages/promptflow/evals/evaluate/_evaluate.py", line 340, in evaluate
    return _evaluate(
  File "/anaconda/envs/azureml_py38/lib/python3.9/site-packages/promptflow/evals/evaluate/_evaluate.py", line 401, in _evaluate
    input_data_df, target_generated_columns, target_run = _apply_target_to_data(
  File "/anaconda/envs/azureml_py38/lib/python3.9/site-packages/promptflow/evals/evaluate/_evaluate.py", line 183, in _apply_target_to_data
    run = pf_client.run(
  File "/anaconda/envs/azureml_py38/lib/python3.9/site-packages/promptflow/_sdk/_pf_client.py", line 301, in run
    return self._run(
  File "/anaconda/envs/azureml_py38/lib/python3.9/site-packages/promptflow/_sdk/_pf_client.py", line 226, in _run
    return self.runs.create_or_update(run=run, **kwargs)
  File "/anaconda/envs/azureml_py38/lib/python3.9/site-packages/promptflow/_sdk/_telemetry/activity.py", line 265, in wrapper
    return f(self, *args, **kwargs)
  File "/anaconda/envs/azureml_py38/lib/python3.9/site-packages/promptflow/_sdk/operations/_run_operations.py", line 135, in create_or_update
    created_run = RunSubmitter(client=self._client).submit(run=run, **kwargs)
  File "/anaconda/envs/azureml_py38/lib/python3.9/site-packages/promptflow/_sdk/_orchestrator/run_submitter.py", line 52, in submit
    task_results = [task.result() for task in tasks]
  File "/anaconda/envs/azureml_py38/lib/python3.9/site-packages/promptflow/_sdk/_orchestrator/run_submitter.py", line 52, in <listcomp>
    task_results = [task.result() for task in tasks]
  File "/anaconda/envs/azureml_py38/lib/python3.9/concurrent/futures/_base.py", line 439, in result
    return self.__get_result()
  File "/anaconda/envs/azureml_py38/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
    raise self._exception
  File "/anaconda/envs/azureml_py38/lib/python3.9/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/anaconda/envs/azureml_py38/lib/python3.9/site-packages/promptflow/_sdk/_orchestrator/run_submitter.py", line 134, in _run_bulk
    self._submit_bulk_run(flow=flow, run=run, local_storage=local_storage)
  File "/anaconda/envs/azureml_py38/lib/python3.9/site-packages/promptflow/_sdk/_orchestrator/run_submitter.py", line 226, in _submit_bulk_run
    local_storage.dump_snapshot(flow)
  File "/anaconda/envs/azureml_py38/lib/python3.9/site-packages/promptflow/_sdk/operations/_local_storage_operations.py", line 256, in dump_snapshot
    shutil.copytree(
  File "/anaconda/envs/azureml_py38/lib/python3.9/shutil.py", line 568, in copytree
    return _copytree(entries=entries, src=src, dst=dst, symlinks=symlinks,
  File "/anaconda/envs/azureml_py38/lib/python3.9/shutil.py", line 522, in _copytree
    raise Error(errors)
shutil.Error: [('<A soft link file>, '<A soft link file>', "[Errno 21] Is a directory: '<A soft link file>")]

How To Reproduce the bug Steps to reproduce the behavior, how frequent can you experience the bug: 1.

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Running Information(please complete the following information):

{ "promptflow": "1.13.0", "promptflow-azure": "1.13.0", "promptflow-core": "1.13.0", "promptflow-devkit": "1.13.0", "promptflow-evals": "0.3.1", "promptflow-tracing": "1.13.0" }

Executable '/anaconda/envs/azureml_py38/bin/python' Python (Linux) 3.9.19 | packaged by conda-forge | (main, Mar 20 2024, 12:50:21) [GCC 12.3.0]

Additional context Add any other context about the problem here.

wangchao1230 commented 3 months ago

Promptflow uploads your code snapshot to remote run, for flow.dag.yaml it's usually the dictory contains the flow.dag.yaml. For your case, could you share more on how do you specify the target flow? I guess somehow it treats the current working directory as the code snapshot root.

D-W- commented 3 months ago

To add up to @wangchao1230 's comment, the reason we need to upload the whole working directory when flow runs in remote is because we need to make sure all current flow's dependency can be found in remote. Uploading working directory will make sure the flow runs in same environment between local & cloud. For example, if you used a customized evaluator, we need to upload the whole working directory to make sure the customized evaluator can be load successfully in remote. But if you only used built-in evaluators, I think some improvements can be done to skip uploading the whole working directory and only upload the flow YAML file. +@luigiw @singankit for comment here.

github-actions[bot] commented 2 months ago

Hi, we're sending this friendly reminder because we haven't heard back from you in 30 days. We need more information about this issue to help address it. Please be sure to give us your input. If we don't hear back from you within 7 days of this comment, the issue will be automatically closed. Thank you!