Open bheilbrun opened 1 year ago
That's an interesting one! I haven't checked stream_with_context()
yet, but my gut-feel is that you could add a custom metric on the generate()
function inside the request handler function, and that should be timed OK.
We could also look at adding some streaming-friendly wrappers to work with those handlers directly, I haven't looked at that yet.
What's the best way to measure duration for streaming endpoints?
If I'm not mistaken, the current latency measurements don't work for streaming responses. prometheus_flask_exporter measures the time to return the response generator rather than the time to actually generate the stream response.
Flask's stream documentation gives an example of a streaming endpoint,
In this example, prometheus_flask_exporter would start a duration timer via
Flask.before_request
and then record the duration viaFlask.after_request
. Whenafter_request
is invoked, the actual response bytes haven't been generated or sent.I wonder if measuring via
Flask.teardown_request
along withstream_with_context()
would work, but not sure.Thoughts appreciated!