microsoft / promptflow

Build high-quality LLM apps - from prototyping, testing to production deployment and monitoring.
https://microsoft.github.io/promptflow/
MIT License
9.18k stars 829 forks source link

[BUG] RunSubmitter() hits error uploading local run with azure-ai-ml==1.19.0 #3666

Open nicsuzor opened 4 weeks ago

nicsuzor commented 4 weeks ago

Describe the bug After upgrading to azure-ai-ml==1.19.0, I get an error trying to upload the run to cloud storage.

run = localclient.run(flow=flow, data=dataset, column_mapping=columns, stream=True)

Runs locally, and then errors out:

[2024-08-16 11:37:50 +0000][promptflow._sdk._orchestrator.run_submitter][INFO] - Uploading run 'main_my_python_tool_d7eghrw4_20240816_113743_926890' to cloud...
Traceback (most recent call last):
  File "/opt/conda/envs/python311clean/lib/python3.11/site-packages/promptflow/azure/operations/_async_run_uploader.py", line 439, in _upload_single_blob
    await blob_client.upload_blob(f, overwrite=self.overwrite)
  File "/opt/conda/envs/python311clean/lib/python3.11/site-packages/azure/core/tracing/decorator_async.py", line 94, in wrapper_use_tracer
    return await func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/python311clean/lib/python3.11/site-packages/azure/storage/blob/aio/_blob_client_async.py", line 588, in upload_blob
    return cast(Dict[str, Any], await upload_block_blob(**options))
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/python311clean/lib/python3.11/site-packages/azure/storage/blob/aio/_upload_helpers.py", line 177, in upload_block_blob
    process_storage_error(error)
  File "/opt/conda/envs/python311clean/lib/python3.11/site-packages/azure/storage/blob/_shared/response_handlers.py", line 186, in process_storage_error
    exec("raise error from None")   # pylint: disable=exec-used # nosec
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<string>", line 1, in <module>
azure.core.exceptions.HttpResponseError: This request is not authorized to perform this operation using this permission.
RequestId:fadd120d-201e-0074-56d0-ef76fd000000
Time:2024-08-16T11:37:57.9862171Z
ErrorCode:AuthorizationPermissionMismatch
Content: <?xml version="1.0" encoding="utf-8"?><Error><Code>AuthorizationPermissionMismatch</Code><Message>This request is not authorized to perform this operation using this permission.
RequestId:fadd120d-201e-0074-56d0-ef76fd000000
Time:2024-08-16T11:37:57.9862171Z</Message></Error>

How To Reproduce the bug Steps to reproduce the behavior, how frequent can you experience the bug:

  1. Reproduce every time with:

pip install azure-ai-ml==1.19.0

and minimal test case:

from tempfile import NamedTemporaryFile
from promptflow.core import tool
from promptflow.client import PFClient as LocalPFClient
from promptflow.tracing import start_trace, trace

@tool
def my_python_tool(input1: str) -> str:
    return input1 + "!"

def run() -> None:
    data = """{"id": 1, "text": "text1"}
    {"id": 2, "text": "text2"}"""
    with NamedTemporaryFile(delete=False, suffix='.jsonl', mode='w') as f:
        f.write(data)
    dataset = f.name

    localclient = LocalPFClient()

    start_trace()

    columns = {'input1': r'${data.text}'}
    flow = my_python_tool
    run = localclient.run(flow=flow, data=dataset, column_mapping=columns, stream=True)
    print(run.status)

if __name__ == '__main__':
    run()

I went through and re-created my resource group and my workspace and storage resources trying to chase this bug down. I think I can rule out most anything else.

Expected behavior

Downgrading uploads the run with no problems.

Collecting azure-ai-ml==1.18.0
  Using cached azure_ai_ml-1.18.0-py3-none-any.whl.metadata (31 kB)
...
Installing collected packages: azure-ai-ml
  Attempting uninstall: azure-ai-ml
    Found existing installation: azure-ai-ml 1.19.0
    Uninstalling azure-ai-ml-1.19.0:
      Successfully uninstalled azure-ai-ml-1.19.0
Successfully installed azure-ai-ml-1.18.0
(python311clean) vscode ➜ /workspaces/dev (dev) $ python deps.py 
Prompt flow service has started...
[2024-08-16 11:43:42 +0000][promptflow._sdk._orchestrator.run_submitter][INFO] - Upload run to cloud: True
Prompt flow service has started...
You can view the traces in local from http://127.0.0.1:23334/v1.0/ui/traces/?#run=main_my_python_tool_3wwnzg7z_20240816_114342_031288
You can view the traces in azure portal since trace destination is set to: azureml://subscriptions/.... The link will be printed once the run is finished.
[2024-08-16 11:43:44 +0000][promptflow._sdk._orchestrator.run_submitter][INFO] - Submitting run main_my_python_tool_3wwnzg7z_20240816_114342_031288, log path: /home/vscode/.promptflow/.runs/main_my_python_tool_3wwnzg7z_20240816_114342_031288/logs.txt
You can view the trace detail from the following URL:
http://127.0.0.1:23334/v1.0/ui/traces/?#collection=dev&uiTraceId=0xbbf827f5f28560e31850d8bf330a6bd7
https://ai.azure.com/projecttrace/detail/0xbbf827f5f28560e31850d8bf330a6bd7?wsid=/...
You can view the trace detail from the following URL:
http://127.0.0.1:23334/v1.0/ui/traces/?#collection=dev&uiTraceId=0x2fdacda0df791760429ef7f7298eb145
https://ai.azure.com/projecttrace/detail/0x2fdacda0df791760429ef7f7298eb145?wsid=...
2024-08-16 11:43:46 +0000   78879 execution.bulk     INFO     Process 78910 terminated.
2024-08-16 11:43:46 +0000   78879 execution.bulk     WARNING  Process 78917 had been terminated.
[2024-08-16 11:43:48 +0000][promptflow._sdk._orchestrator.run_submitter][INFO] - Uploading run 'main_my_python_tool_3wwnzg7z_20240816_114342_031288' to cloud...
[2024-08-16 11:43:55 +0000][promptflow._sdk._orchestrator.run_submitter][INFO] - Updating run 'main_my_python_tool_3wwnzg7z_20240816_114342_031288' portal url to 'https://ai.azure.com/projectflows/trace/run/main_my_python_tool_3wwnzg7z_20240816_114342_031288/details?wsid=/subscriptions/...
2024-08-16 11:43:44 +0000   78628 execution.bulk     INFO     Current thread is not main thread, skip signal handler registration in BatchEngine.
2024-08-16 11:43:44 +0000   78628 execution.bulk     INFO     Set process count to 2 by taking the minimum value among the factors of {'default_worker_count': 4, 'row_count': 2}.
2024-08-16 11:43:45 +0000   78628 execution.bulk     INFO     Process name(ForkProcess-4:2)-Process id(78917)-Line number(1) start execution.
2024-08-16 11:43:45 +0000   78628 execution.bulk     INFO     Process name(ForkProcess-4:1)-Process id(78910)-Line number(0) start execution.
2024-08-16 11:43:45 +0000   78628 execution.bulk     INFO     Process name(ForkProcess-4:1)-Process id(78910)-Line number(0) completed.
2024-08-16 11:43:45 +0000   78628 execution.bulk     INFO     Process name(ForkProcess-4:2)-Process id(78917)-Line number(1) completed.
2024-08-16 11:43:46 +0000   78628 execution.bulk     INFO     Finished 2 / 2 lines.
2024-08-16 11:43:46 +0000   78628 execution.bulk     INFO     Average execution time for completed lines: 1.0 seconds. Estimated time for incomplete lines: 0.0 seconds.
2024-08-16 11:43:46 +0000   78628 execution.bulk     INFO     The thread monitoring the process [78910-ForkProcess-4:1] will be terminated.
2024-08-16 11:43:46 +0000   78628 execution.bulk     INFO     The thread monitoring the process [78917-ForkProcess-4:2] will be terminated.
2024-08-16 11:43:46 +0000   78910 execution.bulk     INFO     The process [78910] has received a terminate signal.
2024-08-16 11:43:46 +0000   78917 execution.bulk     INFO     The process [78917] has received a terminate signal.
======= Run Summary =======

Run name: "main_my_python_tool_3wwnzg7z_20240816_114342_031288"
Run status: "Completed"
Start time: "2024-08-16 11:43:42.030789+00:00"
Duration: "0:00:05.181931"
Output path: "/home/vscode/.promptflow/.runs/main_my_python_tool_3wwnzg7z_20240816_114342_031288"

Completed

Running Information(please complete the following information):

Additional context I don't know if I should file this over at https://github.com/Azure/azure-sdk-for-python/, but people running into the problem here might benefit from this report, and promptflow could certainly use a more informative error message. I can't do any more debugging, sorry, I've lost a whole day trying to figure this out already, and it's working on the older version.

0mza987 commented 3 weeks ago

Hi @nicsuzor, Thanks for the feedback. Prompt flow utilizes azure.ai.ml to do the authentication. Not sure what change has been made in azure.ai.ml, but from the error message, you could try to grant RBAC role Storage Blob Data Contributor of the related storage account to yourself, if you are not sure what storage account it is, you can also do it at resource group level.

nicsuzor commented 3 weeks ago

Yes, thank you @0mza987, I did try that, and gave the permission to the service account as well, and then recreated a new resource group in case I messed something up and did it all again. It's entirely possible I missed something, but it doesn't seem like it actually is a permissions issue since it works with the same resources if I just downgrade the azure.ai.ml library. Happy to close if it's not relevant here, just thought this might be helpful to others.