open-telemetry / opentelemetry-python-contrib

OpenTelemetry instrumentation for Python modules
https://opentelemetry.io
Apache License 2.0
698 stars 579 forks source link

FastAPI/Starlette - Inconsistent placement of the OpenTelemetryMiddleware #795

Open phillipuniverse opened 2 years ago

phillipuniverse commented 2 years ago

The Starlette application class (which is the parent of the FastAPI application class) hardcodes the ServerErrorMiddleware to appear as the very first middleware. This means that if there is an error in some downstream middleware, handling the request with ServerErrorMiddleware will not have an active trace.

Also, if you happen to do something like this:

app = FastAPI(...)

app.add_middleware(...)
app.add_middlware(....)

The additional middleware added after the first __init__ of FastAPI will also not have an active trace established.

There is a relatively easy way around this to ensure that the OpenTelemetry middleware is always first:

class _InstrumentedFastAPI(fastapi.FastAPI):
    _tracer_provider = None
    _excluded_urls = None
    _server_request_hook: _ServerRequestHookT = None
    _client_request_hook: _ClientRequestHookT = None
    _client_response_hook: _ClientResponseHookT = None

    def build_middleware_stack(self) -> ASGIApp:
        original_mw_app = super().build_middleware_stack()
        return OpenTelemetryMiddleware(
            original_mw_app,
            excluded_urls=_InstrumentedFastAPI._excluded_urls,
            default_span_details=_get_route_details,
            server_request_hook=_InstrumentedFastAPI._server_request_hook,
            client_request_hook=_InstrumentedFastAPI._client_request_hook,
            client_response_hook=_InstrumentedFastAPI._client_response_hook,
            tracer_provider=_InstrumentedFastAPI._tracer_provider,
        )

FastAPI version - 0.70.0 Starlette version - 0.16.0 Python version - 3.9

phillipuniverse commented 2 years ago

Slightly more concrete use case:

    application = FastAPI(
        title="My API"
    )

    async def unhandled_exception_handler(request: Request, exc: Exception) -> JSONResponse:
        logger.exception("Unhandled exception")

        return JSONResponse(
            status_code=HTTP_500_INTERNAL_SERVER_ERROR,
            content=jsonable_encoder({"title": "Unhandled error"),
        )

    application.add_exception_handler(Exception, unhandled_exception_handler)

In this case the add_exception_handler is hooked up as the error handler for ServerErrorMiddleware. Because of this and because the OpenTelemetry middleware is installed as "user middleware", the logger.exception invocation within the unhandled_exception_handler will not have a trace id or span id as the span has already ended.

phillipuniverse commented 2 years ago

The workaround for now is to add an opentelemetry_pre_instrument entrypoint. Here is my patching function:

import opentelemetry
import typing
from starlette.types import ASGIApp
from opentelemetry.instrumentation.asgi import OpenTelemetryMiddleware
from opentelemetry.instrumentation.fastapi import _InstrumentedFastAPI
from opentelemetry.instrumentation.fastapi import _get_route_details

def patch_middleware_ordering():
    """
    Replaces the default OpenTelemetry instrumentation class with one that is assured to always be added to the very
    beginning of the middleware stack, even before ServerMiddleware. Necessary until https://github.com/open-telemetry/opentelemetry-python-contrib/issues/795
    is resolved
    """

    class OpenTelemetryFirstMiddlewareFastApi(_InstrumentedFastAPI):

        def add_middleware(self, middleware_class: type, **options: typing.Any) -> None:
            if middleware_class == OpenTelemetryMiddleware:
                return
            else:
                super().add_middleware(middleware_class, **options)

        def build_middleware_stack(self) -> ASGIApp:
            app = super().build_middleware_stack()
            return OpenTelemetryMiddleware(
                app,
                excluded_urls=_InstrumentedFastAPI._excluded_urls,
                default_span_details=_get_route_details,
                server_request_hook=_InstrumentedFastAPI._server_request_hook,
                client_request_hook=_InstrumentedFastAPI._client_request_hook,
                client_response_hook=_InstrumentedFastAPI._client_response_hook,
                tracer_provider=_InstrumentedFastAPI._tracer_provider,
            )

    opentelemetry.instrumentation.fastapi._InstrumentedFastAPI = OpenTelemetryFirstMiddlewareFastApi

Then you need an entrypoint for fastapi under opentelemtry_pre_instrument. In my case I am using Poetry so my pyproject.toml looks like this:

[tool.poetry.plugins."opentelemetry_pre_instrument"]
fastapi = "mylibrary.tracing.contrib.fastapi:patch_middleware_ordering"

With this change I am able to see trace/span ids in logger invocations from ServerErrorMiddleware.

I would like to contribute this back but the problem is I don't think this will work for the non-auto-instrument case which does add_middleware:

https://github.com/open-telemetry/opentelemetry-python-contrib/blob/59933299ef35d7112f4bb9e8081e7ca5f627373c/instrumentation/opentelemetry-instrumentation-fastapi/src/opentelemetry/instrumentation/fastapi/__init__.py?_pjax=%23js-repo-pjax-container%2C%20div%5Bitemtype%3D%22http%3A%2F%2Fschema.org%2FSoftwareSourceCode%22%5D%20main%2C%20%5Bdata-pjax-container%5D#L109-L141

This whole process is a little shady anyway. It seems a little strange to me that Starlette/FastAPI does not give first-class support to fully control the final middleware ordering.

Also let me just say, I have been really impressed with all of the extension points that I have seen within this project. It was really nice to be able to self-serve and unblock myself in a not as nasty way. Great work!!

lskrajny-marcura commented 1 year ago

Hi, is it possible to monkey patch it without poetry, trying to do in multiple ways, but OpenTelemetryFirstMiddlewareFastApi class is not used :/

phillipuniverse commented 1 year ago

@lskrajny-marcura depends on how you are kicking off the instrumentation process. The poetry part above is not important per se, the important part is that you add the monkey patching as an entry_point.txt such that it runs prior to any default open telemetry instrumentation.

lskrajny-marcura commented 1 year ago

Have a simple app with main.py where i am first creating FastApi/Starlette app and then manually adding all opentelemetry instrumentations.

Having patching on top of main.py seems "too late"

lzchen commented 1 year ago

@phillipuniverse Do you foresee any repercussions for adding the tracing middleware first? Would it conflict with other middlewares?

phillipuniverse commented 1 year ago

Do you foresee any repercussions for adding the tracing middleware first? Would it conflict with other middlewares?

@lzchen it has not in my experience. I have been running this patch in production since November 9, 2021 with no issues.

I did have to make one change in FastAPI 0.92/Starlette 0.24 with the change to lazily-instantiated middleware but it was pretty straightforward (I'll update my answer above to include the change, it's not there currently).

In my scenario this change is crucial for us to be able to be able to annotate unhandled exceptions in our application with a trace id in our external error reporting system.

FWIW I would consider the main problem to be a FastAPI problem where it limits full control over the entire middleware list.

lzchen commented 1 year ago

@phillipuniverse Makes sense to me. Would you consider creating a PR for this?

phillipuniverse commented 1 year ago

No problem!

Kyle-sandeman-mrdfood commented 7 months ago

Hi, this is still an issue even in fastapi==0.103.2.
I am using both app.add_middleware(MyCustomMW) and custom exception handlers

Here are my debugging steps, if it will help anyone understand the problem better

In your asgi instrumentation's __init__.py I added two print statements to show exactly when the span starts and whether it is recording (i.e. in the with ..use_span and if ..is_recording) With auto instrumentation I get these logs

[request begins]
------- MyCustomMW: current span = None 
---- OTelMW: started span
---- OTelMW: recording
[other logs]
------- MyCustomMW: cleanup step, span = None
---- OTelMW: finally clause reached

And after using a patch similar to the one posted earlier in this issue:

otel_mw = find_otel_mw()
cls, options = otel_mw

app.user_middleware.remove(otel_mw)
stack = app.build_middleware(stack)
app.middleware_stack = cls(app=stack, **options)

Then I get:

[request begins]
---- OTelMW: started span
---- OTelMW: recording
------- MyCustomMW: current span = _Span(name...)
[other logs]
------- MyCustomMW: cleanup step, span = _Span(name=...)
---- OTelMW: finally clause reached

And just to inspect the middleware stack creation, I added prints to the various __init__ functions: original:

init OpenTelemetryInstrumentation middleware
init MyCustomMW middleware

patched:

init MyCustomMW middleware
init OpenTelemetryInstrumentation middleware