microsoft / promptflow

Build high-quality LLM apps - from prototyping, testing to production deployment and monitoring.
https://microsoft.github.io/promptflow/
MIT License
9.59k stars 878 forks source link

[BUG] Cannot connect to promptflow tracing during release deploy pipeline #3391

Closed tyler-suard-parker closed 5 months ago

tyler-suard-parker commented 5 months ago

Describe the bug Promptflow tracing requires: az login pf config set trace.destination..... etc.

When I try to run this as part of my deployment automation workflow in Devops, I get this error every time:

2024-06-07T17:19:50.9316803Z ERROR: The command failed with an unexpected error. Here is the traceback: 2024-06-07T17:19:50.9352168Z ERROR: HTTPSConnectionPool(host='login.microsoftonline.com', port=443): Max retries exceeded with url: /organizations/v2.0/.well-known/openid-configuration (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000001E92283EE10>: Failed to establish a new connection: [WinError 10013] An attempt was made to access a socket in a way forbidden by its access permissions')) 2024-06-07T17:19:50.9353333Z Traceback (most recent call last): 2024-06-07T17:19:50.9353863Z File "D:\a_work\1\s\build_scripts\windows\artifacts\cli\Lib\site-packages\urllib3/connection.py", line 174, in _new_conn 2024-06-07T17:19:50.9354502Z File "D:\a_work\1\s\build_scripts\windows\artifacts\cli\Lib\site-packages\urllib3/util/connection.py", line 95, in create_connection 2024-06-07T17:19:50.9355180Z File "D:\a_work\1\s\build_scripts\windows\artifacts\cli\Lib\site-packages\urllib3/util/connection.py", line 85, in create_connection 2024-06-07T17:19:50.9355839Z PermissionError: [WinError 10013] An attempt was made to access a socket in a way forbidden by its access permissions 2024-06-07T17:19:50.9356109Z 2024-06-07T17:19:50.9356463Z During handling of the above exception, another exception occurred: 2024-06-07T17:19:50.9357845Z 2024-06-07T17:19:50.9358145Z Traceback (most recent call last): 2024-06-07T17:19:50.9358642Z File "D:\a_work\1\s\build_scripts\windows\artifacts\cli\Lib\site-packages\urllib3/connectionpool.py", line 715, in urlopen 2024-06-07T17:19:50.9359316Z File "D:\a_work\1\s\build_scripts\windows\artifacts\cli\Lib\site-packages\urllib3/connectionpool.py", line 404, in _make_request 2024-06-07T17:19:50.9359956Z File "D:\a_work\1\s\build_scripts\windows\artifacts\cli\Lib\site-packages\urllib3/connectionpool.py", line 1058, in _validate_conn 2024-06-07T17:19:50.9360610Z File "D:\a_work\1\s\build_scripts\windows\artifacts\cli\Lib\site-packages\urllib3/connection.py", line 363, in connect 2024-06-07T17:19:50.9361281Z File "D:\a_work\1\s\build_scripts\windows\artifacts\cli\Lib\site-packages\urllib3/connection.py", line 186, in _new_conn 2024-06-07T17:19:50.9362096Z urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x000001E92283EE10>: Failed to establish a new connection: [WinError 10013] An attempt was made to access a socket in a way forbidden by its access permissions 2024-06-07T17:19:50.9362612Z 2024-06-07T17:19:50.9362938Z During handling of the above exception, another exception occurred: 2024-06-07T17:19:50.9363167Z 2024-06-07T17:19:50.9363427Z Traceback (most recent call last): 2024-06-07T17:19:50.9363909Z File "D:\a_work\1\s\build_scripts\windows\artifacts\cli\Lib\site-packages\requests/adapters.py", line 486, in send 2024-06-07T17:19:50.9364541Z File "D:\a_work\1\s\build_scripts\windows\artifacts\cli\Lib\site-packages\urllib3/connectionpool.py", line 827, in urlopen 2024-06-07T17:19:50.9365184Z File "D:\a_work\1\s\build_scripts\windows\artifacts\cli\Lib\site-packages\urllib3/connectionpool.py", line 799, in urlopen 2024-06-07T17:19:50.9365820Z File "D:\a_work\1\s\build_scripts\windows\artifacts\cli\Lib\site-packages\urllib3/util/retry.py", line 592, in increment 2024-06-07T17:19:50.9366983Z urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='login.microsoftonline.com', port=443): Max retries exceeded with url: /organizations/v2.0/.well-known/openid-configuration (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000001E92283EE10>: Failed to establish a new connection: [WinError 10013] An attempt was made to access a socket in a way forbidden by its access permissions')) 2024-06-07T17:19:50.9367764Z 2024-06-07T17:19:50.9368136Z During handling of the above exception, another exception occurred: 2024-06-07T17:19:50.9368337Z 2024-06-07T17:19:50.9368572Z Traceback (most recent call last): 2024-06-07T17:19:50.9368849Z File "D:\a_work\1\s\build_scripts\windows\artifacts\cli\Lib\site-packages\knack/cli.py", line 233, in invoke 2024-06-07T17:19:50.9369231Z File "D:\a_work\1\s\build_scripts\windows\artifacts\cli\Lib\site-packages\azure/cli/core/commands/init.py", line 664, in execute 2024-06-07T17:19:50.9369645Z File "D:\a_work\1\s\build_scripts\windows\artifacts\cli\Lib\site-packages\azure/cli/core/commands/init.py", line 731, in _run_jobs_serially 2024-06-07T17:19:50.9370051Z File "D:\a_work\1\s\build_scripts\windows\artifacts\cli\Lib\site-packages\azure/cli/core/commands/init.py", line 701, in _run_job 2024-06-07T17:19:50.9370453Z File "D:\a_work\1\s\build_scripts\windows\artifacts\cli\Lib\site-packages\azure/cli/core/commands/init.py", line 334, in call 2024-06-07T17:19:50.9370864Z File "D:\a_work\1\s\build_scripts\windows\artifacts\cli\Lib\site-packages\azure/cli/core/commands/command_operation.py", line 121, in handler 2024-06-07T17:19:50.9371262Z File "D:\a_work\1\s\build_scripts\windows\artifacts\cli\Lib\site-packages\azure/cli/command_modules/profile/custom.py", line 139, in login 2024-06-07T17:19:50.9371654Z File "D:\a_work\1\s\build_scripts\windows\artifacts\cli\Lib\site-packages\azure/cli/core/_profile.py", line 154, in login 2024-06-07T17:19:50.9372052Z File "D:\a_work\1\s\build_scripts\windows\artifacts\cli\Lib\site-packages\azure/cli/core/auth/identity.py", line 160, in login_with_auth_code 2024-06-07T17:19:50.9373083Z File "D:\a_work\1\s\build_scripts\windows\artifacts\cli\Lib\site-packages\azure/cli/core/auth/identity.py", line 119, in _msal_app 2024-06-07T17:19:50.9373474Z File "D:\a_work\1\s\build_scripts\windows\artifacts\cli\Lib\site-packages\msal/application.py", line 1814, in init 2024-06-07T17:19:50.9373848Z File "D:\a_work\1\s\build_scripts\windows\artifacts\cli\Lib\site-packages\msal/application.py", line 504, in init 2024-06-07T17:19:50.9374201Z File "D:\a_work\1\s\build_scripts\windows\artifacts\cli\Lib\site-packages\msal/authority.py", line 115, in init 2024-06-07T17:19:50.9374582Z File "D:\a_work\1\s\build_scripts\windows\artifacts\cli\Lib\site-packages\msal/authority.py", line 182, in tenant_discovery 2024-06-07T17:19:50.9374964Z File "D:\a_work\1\s\build_scripts\windows\artifacts\cli\Lib\site-packages\msal/individual_cache.py", line 269, in wrapper 2024-06-07T17:19:50.9375325Z File "D:\a_work\1\s\build_scripts\windows\artifacts\cli\Lib\site-packages\requests/sessions.py", line 602, in get 2024-06-07T17:19:50.9375693Z File "D:\a_work\1\s\build_scripts\windows\artifacts\cli\Lib\site-packages\requests/sessions.py", line 589, in request 2024-06-07T17:19:50.9376061Z File "D:\a_work\1\s\build_scripts\windows\artifacts\cli\Lib\site-packages\requests/sessions.py", line 703, in send 2024-06-07T17:19:50.9376405Z File "D:\a_work\1\s\build_scripts\windows\artifacts\cli\Lib\site-packages\requests/adapters.py", line 519, in send 2024-06-07T17:19:50.9377099Z requests.exceptions.ConnectionError: HTTPSConnectionPool(host='login.microsoftonline.com', port=443): Max retries exceeded with url: /organizations/v2.0/.well-known/openid-configuration (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000001E92283EE10>: Failed to establish a new connection: [WinError 10013] An attempt was made to access a socket in a way forbidden by its access permissions')) 2024-06-07T17:19:50.9385986Z To check existing issues, please visit: https://github.com/Azure/azure-cli/issues 2024-06-07T17:19:51.5542663Z pf : The term 'pf' is not recognized as the name of a cmdlet, function, script file, or operable program. Check the 2024-06-07T17:19:51.5543369Z spelling of the name, or if a path was included, verify that the path is correct and try again. 2024-06-07T17:19:51.5543918Z At D:\AzureDevOps\Agents\A2_work_temp\61376baa-b908-4c17-ad64-dc2bcd9fd926.ps1:5 char:1 2024-06-07T17:19:51.5544302Z + pf config set trace.destination="azureml://subscriptions/a06dfadd-e31 ... 2024-06-07T17:19:51.5544642Z + ~~ 2024-06-07T17:19:51.5545002Z + CategoryInfo : ObjectNotFound: (pf:String) [], ParentContainsErrorRecordException 2024-06-07T17:19:51.5545404Z + FullyQualifiedErrorId : CommandNotFoundException 2024-06-07T17:19:51.5546027Z
2024-06-07T17:19:51.5757871Z ##[debug]Exit code: 1 2024-06-07T17:19:51.5797694Z ##[debug]Leaving Invoke-VstsTool. 2024-06-07T17:19:51.6194256Z ##[error]PowerShell exited with code '1'. 2024-06-07T17:19:51.6201865Z ##[debug]Processed: ##vso[task.logissue type=error;source=TaskInternal]PowerShell exited with code '1'. 2024-06-07T17:19:51.6315367Z ##[debug]Processed: ##vso[task.complete result=Failed]Error detected 2024-06-07T17:19:51.6337231Z ##[debug]Leaving D:\AzureDevOps\Agents\A2_work_tasks\PowerShell_e213ff0f-5d5c-4791-802d-52ea3e7be1f1\2.239.1\powershell.ps1. 2024-06-07T17:19:51.6615458Z ##[section]Finishing: PowerShell Script

How To Reproduce the bug Steps to reproduce the behavior, how frequent can you experience the bug:

  1. Try to get promptflow tracing to run on an Azure Web Application
zhengfeiwang commented 5 months ago

Hi @tyler-suard-parker , thanks for reaching out. According to your error message, seems there are two issues:

  1. az login failed, might be permission issue for your Azure web app
  2. pf seems does not installed correctly. I see

image

Maybe you need to ensure 1) your Azure web app has enough permission; 2) you have correctly installed pf in the compute.

tyler-suard-parker commented 5 months ago

@zhengfeiwang Thank you for the response. I installed promptflow using python -m pip install promptflow promptflow-tools etc. during the build phase. During the deployment phase, I changed my startup command to do the same. I am getting "requirement already satisfied" for promptflow during the install phase, but the pf command in bash is still not recognized. I am not directly calling pf, it looks like when I import promptflow in my Python script, it tries to run pf in the command line?

Also I am able to log in using azure-cli, but only manually with the interactive web page.

tyler-suard-parker commented 5 months ago

I have tried multiple ways around this, including exporting the binary directory holding pf to PATH, and this is still now working. Please advise on how to get this to work.

zhengfeiwang commented 5 months ago

I see line below from your error message:

2024-06-07T17:19:51.5543918Z At D:\AzureDevOps\Agents\A2_work_temp\61376baa-b908-4c17-ad64-dc2bcd9fd926.ps1:5 char:1

So I think there is one powershell script executing az login and pf config set. For the Azure CLI login issue, not sure what's the next step because it seems no action from prompt flow. I recommend you to raise your question in Azure CLI repo: https://github.com/Azure/azure-cli.

The way you install prompt flow looks good to me, but not sure why your compute cannot find the binary. What pf config set trace.destination actually does is 1) validate the Cosmos DB status; 2) persist this config in pf.yaml. So one workaround for you is if you can ensure your Cosmos resource is available, you can modify pf.yaml with simple bash commands, instead of pf commands.

tyler-suard-parker commented 5 months ago

@zhengfeiwang Thank you for your help. That is correct, I have one powershell script executing both az login and pf config set.

That is good to know that I can modify the pf.yaml file directly, but that still does not help with getting promptflow tracing to run. Whenever I import promptflow tracing:

from promptflow.tracing import start_trace from promptflow.tracing._integrations._openai_injector import recover_openai_api start_trace(collection="PGPT")

for some reason it tries to run the "pf" command in the OS, and even though I installed PF twice already, I still get the bash error "file not found". Promptflow was installed in a virtual environment on the build computer, and then that virtual environment and working directory were zipped up and moved to the deployment computer, so that may have something to do with it. However, after deployment I can't modify the file system on that computer because it is read-only.

tyler-suard-parker commented 5 months ago

Here is some more research I have done: the "pf" file has the path for an interpreter at the top:

#!/home/vsts/work/1/s/antenv/bin/python
# -*- coding: utf-8 -*- 
import re
import sys
from promptflow._cli.pf import main
if __name__ == '__main__':
    sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])
    sys.exit(main()) 

It looks like that path was created on the build computer, but it did not change when the built package was moved to a different directory on the deployment computer.

zhengfeiwang commented 5 months ago

in that case, as I don't know how you deploy, you might need to make sure the installation happens in your deploy environment; and seems there is no action required from prompt flow for this issue, so I plan to close this issue - please let me know if you still have further questions regarding to this issue.

tyler-suard-parker commented 5 months ago

No @zhengfeiwang, do not close this ticket. My issue is, why did your team write this program so that it forces users to authenticate using azure CLI, making it completely incompatible with other Azure products like WebApps? For every other Azure service, a user can connect by using environment variables. For Azure AI Search, CosmosDB, SQL, OpenAI, Blob Storage, and Cognitive Services, I can easily connect using environment variables and entering strings like API key, endpoint, etc. When your team wrote PromptFlow tracing, you decided to make the user login via azure CLI and then set the trace destination using another command line. That is impossible to do in an Azure web app, according to this post from a microsoft employee: https://learn.microsoft.com/en-us/answers/questions/430646/azure-cli-commands-from-within-the-webapp-web-api. Please change your code so that it is compatible with other Azure products like WebApps. A way to do this would be to connect using strings and environment variables. I am feeling SO frustrated with your software. Promptflow is an Azure product and it should be compatible with other Azure products, and currently it is not.

tyler-suard-parker commented 5 months ago
image image
zhengfeiwang commented 5 months ago

@tyler-suard-parker regarding Azure Web App support, my comment in #3411 shall be enough, we'll track this as a potential feature.

Concerning the "pf command not found" issue, prompt flow tracing uses a local service as OTLP collector, to invoke this service, it will internally execute command pf service start. Based on your message, it seems there might have been a misconfiguration in your deployment environment, leading to the command error. Please ensure that the prompt flow is properly installed on your Azure Web App or other runtime environments.

Additionally, I'd like to note that using Cosmos DB for trace persistence is associated with an Azure ML workspace or AI project. From your descriptions across various issues, it appears you're looking to store traces from a custom application hosted on an Azure Web App—a scenario we hadn't initially considered. While the Cosmos resource is currently free, future charges could pose a risk, which you should consider.

At last, I think there is still a workaround for your problem: as prompt flow tracing follows Open Telemetry spec, you can absolutely write a custom exporter that logs traces to a Cosmos DB by yourself; this approach would eliminate the need to configure your deployment environment specifically for the "pf" command. In that case, how traces logged to Cosmos DB is all dependent on you, including the auth part - so I believe you can connect them easily like your prior experience.

Please let me know if you need further assistance or have other questions.

tyler-suard-parker commented 5 months ago

I was able to work around this issue by starting from this tutorial: https://github.com/microsoft/promptflow/tree/main/examples/tutorials/flow-deploy/azure-app-service , creating a new azure web app that uses a container for deployment instead of code, and then installing azure-cli and promptflow during the deployment process.

Here is my dockerfile:

# syntax=docker/dockerfile:1
FROM docker.io/continuumio/miniconda3:latest

WORKDIR /

COPY ./flow/requirements.txt /flow/requirements.txt

# gcc is for build psutil in MacOS
RUN apt-get update && apt-get install -y runit gcc

# create conda environment
RUN conda create -n promptflow-serve python=3.9.16 pip=23.0.1 -q -y && \
    conda run -n promptflow-serve \
    conda run -n promptflow-serve pip install --upgrade pip && \
    conda run -n promptflow-serve pip install -r /flow/requirements.txt && \
    conda run -n promptflow-serve pip install keyrings.alt && \
    conda run -n promptflow-serve pip install gunicorn==20.1.0 && \
    conda run -n promptflow-serve pip install 'uvicorn>=0.27.0,<1.0.0' && \
    conda run -n promptflow-serve pip cache purge && \
    conda clean -a -y

RUN pip install --no-cache-dir --prefer-binary azure-cli
RUN pip install --no-cache-dir --prefer-binary promptflow-azure

RUN az login --username <your username> --password <your password>
RUN pf config set trace.destination="azureml://subscriptions/<your subscription>/resourcegroups/<your resource group>/providers/Microsoft.MachineLearningServices/workspaces/<your workspace name>"

COPY ./flow /flow

EXPOSE 8080

COPY ./connections /connections

# reset runsvdir
RUN rm -rf /var/runit
COPY ./runit /var/runit
# grant permission
RUN chmod -R +x /var/runit

COPY ./start.sh /
CMD ["bash", "./start.sh"]

And here is how I modified the runit / run file:

#! /bin/bash

CONDA_ENV_PATH="$(conda info --base)/envs/promptflow-serve"
export PATH="$CONDA_ENV_PATH/bin:$PATH"

ls
ls /connections
pf connection create --file /connections/open_ai_connection.yaml
az login --username <your username> --password <your password>
pf config set trace.destination="azureml://subscriptions/<your subscription>/resourcegroups/<your resource group>/providers/Microsoft.MachineLearningServices/workspaces/<your workspace>"
WORKER_NUM=${PROMPTFLOW_WORKER_NUM:-"8"}
WORKER_THREADS=${PROMPTFLOW_WORKER_THREADS:-"1"}
SERVING_ENGINE=${PROMPTFLOW_SERVING_ENGINE:-"flask"}
gunicorn_app="promptflow.core._serving.app:create_app(engine='${SERVING_ENGINE}')"
cd /flow
if [ "$SERVING_ENGINE" = "flask" ]; then
    echo "start promptflow serving with worker_num: ${WORKER_NUM}, worker_threads: ${WORKER_THREADS}, app: ${gunicorn_app}"
    gunicorn -w ${WORKER_NUM} --threads ${WORKER_THREADS} -b "0.0.0.0:8080" --timeout 300 ${gunicorn_app}
else
    echo "start promptflow serving with worker_num: ${WORKER_NUM}, app: ${gunicorn_app}"
    gunicorn --worker-class uvicorn.workers.UvicornWorker -w ${WORKER_NUM} -b "0.0.0.0:8080" --timeout 300 ${gunicorn_app}
fi