microsoft / promptflow

Build high-quality LLM apps - from prototyping, testing to production deployment and monitoring.
https://microsoft.github.io/promptflow/
MIT License
9.12k stars 827 forks source link

[BUG] Tracing Contextvar reset #3538

Open Gzozo opened 1 month ago

Gzozo commented 1 month ago

Describe the bug When i call the AsyncOpenAI.chat.completions.create function, the context_var in promptflow.tracing.Tracer class is reset, thus not saving the metrics, like token usage.

How To Reproduce the bug Steps to reproduce the behavior, everytime:

  1. Create a new tool
  2. Create a AsyncAzureOpenAI object (I only have access to OpenAI through Azure)
  3. Call the chat.completions.create function

Expected behavior At the end of the flow, there should be metrics for the run, like token usage, also the api_calls should not be empty, because i called an api, the openai api

Screenshots Before the http call: image

After the http call: image

The Tracer.active_instance is None after the http call, that's why it can't save the metrics

Running Information(please complete the following information):

Additional context I am using the flow as a function

zhengfeiwang commented 1 month ago

Hi @Gzozo , thanks for reporting this. Heyi, the expert on this issue is on his vacation this week. We might triage this next week once Heyi's back.

liucheng-ms commented 1 month ago

Hi @Gzozo ,

Thank you for reaching out and providing detailed information about the issue you’re encountering with missing metrics when using the AsyncOpenAI API. We appreciate your efforts in outlining the steps to reproduce the bug and the expectation for the metrics to be recorded, including token usage and API calls.

To ensure I fully understand the situation and to assist you more effectively, I've attempted to replicate the issue in my local environment. I’d like to share the steps I followed and how I checked the metrics:

Steps

from promptflow import tool
from promptflow.tracing import trace
from promptflow.connections import AzureOpenAIConnection
from promptflow.tools.common import normalize_connection_config
from openai import AsyncAzureOpenAI

@trace
async def chat(connection: AzureOpenAIConnection, question: str, stream: bool = False):
    connection_dict = normalize_connection_config(connection)
    client = AsyncAzureOpenAI(**connection_dict)

    messages = [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": question}]
    response = await client.chat.completions.create(model="gpt-35-turbo", messages=messages, stream=stream)

    if stream:
        async def generator():
            async for chunk in response:
                if chunk.choices:
                    yield chunk.choices[0].delta.content or ""
        return "".join([chunk async for chunk in generator()])
    return response.choices[0].message.content or ""

@tool
async def my_python_tool(connection: AzureOpenAIConnection, question: str, stream: bool) -> str:
    return await chat(connection, question, stream)

Based on this, it appears that the metrics are being collected correctly in my case.

Since you've encountered this issue, it's clear there might be some differences in how we are reproducing the problem. Would you be willing to share your code snippet and the specific steps you're taking to reproduce the issue? This will help us to be on the same page and to troubleshoot the problem more effectively.

Once we have this information, we can better understand what might be going wrong and how we can assist you further.

Thank you for your cooperation, and I look forward to your response.

Best regards.

Gzozo commented 1 month ago

Hi @liucheng-ms ,

Thanks for your reply. Most of the code you wrote is right, but one crucial difference is that i do not collect in case of streaming each chunk, I return with the AsyncGenerator function. This is because of my use case, i want to stream back the response to the client, so i need to send back the chunks as soon as possible.

Thanks in advance.

liucheng-ms commented 1 month ago

Hi @Gzozo ,

Thank you for sharing the specifics of your implementation. I understand the importance of streaming the response to your client promptly and the necessity of using the AsyncGenerator function in your use case.

Regarding the Tracer's inability to track metrics for streaming operations, you are correct that this is a limitation due to the current design of the Tracer. The Tracer instance is finalized after the API call execution, and before the async generator has been fully consumed, which results in the metrics not being captured for streaming responses.

To monitor metrics in scenarios like yours, I suggest utilizing the tracing portal provided by Promptflow. This feature is part of Promptflow's enhanced tracing capabilities and can be found here: https://microsoft.github.io/promptflow/how-to-guides/tracing/index.html

The tracing portal allows you to view detailed metrics in a Web-UI hosted locally. It's a more robust solution for examining the metrics related to your API calls, including those involving streaming.

Please give it a try, and let us know if it helps meet your requirements for metric tracking. Your feedback is invaluable in improving our tools and services.

Thank you for your patience and understanding.

Best regards.

liucheng-ms commented 1 month ago

Hi @Gzozo ,

I hope this message finds you well. I want to follow up on the previous conversation regarding the challenge you faced with the metrics not being captured for streaming responses using the AsyncGenerator function.

Have you had an opportunity to try out the tracing portal feature as suggested? It would be great if you could confirm whether this solution has addressed your needs for tracking metrics during streaming operations.

If the tracing portal does meet your requirements, we can consider this issue resolved and proceed to close it. However, if you're still encountering difficulties or if the solution doesn't fully cater to your use case, we're here to provide further assistance and explore alternative approaches.

Your feedback is essential for us to ensure that our tools are as effective and user-friendly as possible. We look forward to your response.

Thank you for your time and cooperation.

Best regards.

Gzozo commented 1 month ago

Hi @liucheng-ms,

Yes i had time to look at yesterday the portal you provided, but as I saw, it is essentially only for development, and local testing. I want to use the metrics in production environment to create statistics from them. That is why i need to access the metrics at the end of the flow run, to save it in database, and later visualize in dashboards. So as i see it, the tracing portal does not help me.

Thanks for your reply,

liucheng-ms commented 1 month ago

Hi @Gzozo ,

Thank you for your response and for clarifying your use case. I'm interested in understanding more about how you are using Promptflow in your production environment, as well as the specific requirements for collecting and utilizing metrics.

Could you please provide more details on what you mean by using Promptflow in production? Specifically, how do you collect and manage statistics data from your production environment? I'm also curious about why the tracing portal doesn't meet your needs for metric tracking in this context.

Your insights will help us better understand your requirements and explore potential solutions that align with your production needs.

Thank you for your time and cooperation.

Best regards.

Gzozo commented 1 month ago

Hi @liucheng-ms,

I use Promptflow's flows as a function. I built around it a socketify python server, that calls the flow functions. When I deploy it to Azure in a docker container, i cannot seem to access the tracing url, because it only listens on localhost, not on public addresses, thats why i cant use it. Also i believe it would not have relevant llm token statistics as well, because the LineResult object i get when invoking the flow does not have it.

So because i cannot access the deployed server's statistics, I save the statistics in a postgres database.

Best regards,

liucheng-ms commented 3 weeks ago

Hi @Gzozo ,

Thank you for explaining your scenario in detail. I appreciate your patience and your feedback is invaluable to us.

Based on what you described, the behavior you're encountering is expected given the current design of the Promptflow tracing system. We are actively working on a new tracing portal experience, which will address some of the limitations you've encountered with the legacy tracing system, particularly in handling streaming responses. The legacy system has inherent limitations with streaming responses and won't be receiving new capabilities moving forward.

Suggestions for Resolving Your Issue

Non-Streaming Mode

If capturing token usage metrics is critical and you are open to using non-streaming mode, you can switch to this mode to see if it satisfies your requirements. This way, the metrics should be captured accurately.

Streaming Mode with Telemetry

For your use case involving streaming responses and token metrics, I recommend utilizing the Promptflow Tracing package. This package integrates with OpenTelemetry, allowing you to emit telemetry data compatible with OpenTelemetry standards.

You can set up a custom OpenTelemetry exporter to capture the trace data during the execution of Promptflow. Here's a quick way to get started:

Future Enhancements

Your request highlights an important use case: obtaining trace data in a deployed container environment. We are considering enhancements to provide an easier, more user-friendly solution for capturing trace data without requiring custom exporters.

We truly appreciate your feedback, as it helps us improve our tools and better align with user needs. However, I would like to clarify that this isn't a bug within the current product design.

Thank you once again for your feedback and patience. Please let us know if you have any further questions or need assistance with the tracing package setup.

Best regards.