microsoft / promptflow

Build high-quality LLM apps - from prototyping, testing to production deployment and monitoring.
https://microsoft.github.io/promptflow/
MIT License
8.69k stars 779 forks source link

[BUG] Potential incompatible with mlflow>=2.13.1 #3408

Open zhengfeiwang opened 3 weeks ago

zhengfeiwang commented 3 weeks ago

Describe the bug In mlflow>=2.13.1, mlflow supports to force reset the global tracer provider in some operations (related PR: https://github.com/mlflow/mlflow/pull/12137), this might break/disable tracing feature in prompt flow.

How To Reproduce the bug Setup mlflow after prompt flow will result in the issue. Below snippet can demonstrate on that:

from mlflow.tracing.provider import reset_tracer_setup
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor, SpanExporter, SpanExportResult
from promptflow.tracing import trace as pf_trace

class StdoutSpanExporter(SpanExporter):
    def export(self, spans):
        for span in spans:
            print("trace id:", span.context.trace_id)
            print("span id:", span.context.span_id)
        return SpanExportResult.SUCCESS

    def shutdown(self):
        pass

@pf_trace
def do_something():
    print("Doing something")

def setup_exporter():
    trace.set_tracer_provider(TracerProvider())
    tracer_provider: TracerProvider = trace.get_tracer_provider()
    stdout_span_exporter = StdoutSpanExporter()
    tracer_provider.add_span_processor(BatchSpanProcessor(stdout_span_exporter))

def main():
    setup_exporter()
    reset_tracer_setup()  # this line will force reset tracer provider to NoOpTracerProvider
    do_something()

if __name__ == "__main__":
    main()

When reset_tracer_setup is executed, all registered exporter will be reset, so there will be no stdout when traces are closed. We are not sure how/when mlflow will call this function, but seems it will break/disable tracing feature in prompt flow as it reset the exporter.

Expected behavior Operations in mlflow shall not modify global tracer provider, and this should break Open Telemetry standard. Looks prompt flow have no action can be taken for this, and this issue might work as a FYI. for now.

Additional context Elder versions will run into AttributeError complaining about "NoOpTracerProvider has no add_span_processor", there is a PR working on that to skip setup and log a warning: #3407

zhengfeiwang commented 2 weeks ago

opened an issue in mlflow repo: https://github.com/mlflow/mlflow/issues/12341

zhengfeiwang commented 5 days ago

mlflow has fixed this in https://github.com/mlflow/mlflow/pull/12457, where mlflow will only operate their OTel components, instead of global, the issue shall be mitigated; will close this after the patch released and verified.