open-telemetry / opentelemetry-python-contrib

OpenTelemetry instrumentation for Python modules
https://opentelemetry.io
Apache License 2.0
733 stars 606 forks source link

Update ASGI instrumentation to better handle websockets communication #505

Open owais opened 3 years ago

owais commented 3 years ago

Describe your environment ASGI instrumentation today supports both HTTP and websockets requests but treats them the same. I think it works correctly for HTTP but needs to be changed when ti comes to websockets.

HTTP

Like all other instrumentations, ASGI instrumentation correctly generates one trace per HTTP request. Due to it's async nature, it generates 3 spans instead of 1 span that other "sync" frameworks generate. We could in theory generate a single span and update the same span when we notice async events like "send" (that send http response or parts of it back to client) but I think it's better to represent how the code actually works. IMO it is working correctly today for this use case.

Example HTTP trace representing a single HTTP request

code:

code ```python from opentelemetry.instrumentation.asgi import OpenTelemetryMiddleware from starlette.middleware import Middleware from starlette.applications import Starlette from starlette.responses import JSONResponse from starlette.routing import Route async def homepage(request): return JSONResponse({'hello': 'world'}) routes = [ Route("/", endpoint=homepage) ] app = Starlette(debug=True, routes=routes, middleware=[Middleware(OpenTelemetryMiddleware)]) ```
trace image

The two child spans represent ASGI callbacks that are responsible for sending data back to the client. The first send starts the response and sets headers. The second one returns the response body.

Web Sockets

On the other hand, websockets are generally used by applications that need to keep long lived connections open such chat applications, notification channels etc. The ASGI instrumentation treats this the same as HTTP. This results in the entire websockets connection being represented as a single trace. A chat application example with trace:

code ```python from opentelemetry.instrumentation.asgi import OpenTelemetryMiddleware from starlette.middleware import Middleware from starlette.applications import Starlette from starlette.endpoints import WebSocketEndpoint, HTTPEndpoint from starlette.responses import HTMLResponse from starlette.routing import Route, WebSocketRoute html = """ Chat

WebSocket Chat

""" class Homepage(HTTPEndpoint): async def get(self, request): return HTMLResponse(html) class Echo(WebSocketEndpoint): encoding = "text" async def on_receive(self, websocket, data): await websocket.send_text(f"Message text was: {data}") routes = [ Route("/", Homepage), WebSocketRoute("/ws", Echo) ] app = Starlette(routes=routes, middleware=[Middleware(OpenTelemetryMiddleware)]) ```
trace image

The parent SERVER span represents the the chat session (ws connection) being establish. Each couple of child receive+send spans represent a single chat message being sent to the server and the response returned by the server. While this can be useful in some cases, I don't think it's representative of the vast majority of websockets use cases. I think the server span should be created to represent the connection/session but each message request/response cycle should be represented as it's own trace. In other words, the above trace should be actually be 9 traces. The parent/server span from the trace should be an independent trace while as each pair of receive+send messages should form 8 more traces representing 8 chat messages (ws messages). Each receive (request) span should become a SERVER span with the send (response) span being it's child with kind set to INTERNAL.

I'm not sure if this can be done easily given the async nature of ASGI but it should be achievable when the a higher level framework like Starlette or FastAPI are used in combination with ASGI. To enable use cases like this, ASGI instrumentation should set the current context/span on the websocket message object (a dict). Instrumentations for higher level frameworks should then instrument concepts from those frameworks (handlers, views, etc) and use the context attached to the websocket messages when available.

cdvv7788 commented 3 years ago

@owais Possibly relevant for this issue: https://github.com/open-telemetry/opentelemetry-python-contrib/issues/467 Fastapi uses the asgi instrumentation under the hood, and it is having issues (on the http side).

github-actions[bot] commented 3 years ago

This issue was marked stale due to lack of activity. It will be closed in 30 days.

chicoxyzzy commented 1 month ago

I have the same issue. Is there any solution to this?