Streaming responses and latency

What's the best way to measure duration for streaming endpoints?

If I'm not mistaken, the current latency measurements don't work for streaming responses. prometheus_flask_exporter measures the time to return the response generator rather than the time to actually generate the stream response.

Flask's stream documentation gives an example of a streaming endpoint,

@app.route('/large.csv')
def generate_large_csv():
    def generate():
        for row in iter_all_rows():
            yield f"{','.join(row)}\n"
    return app.response_class(generate(), mimetype='text/csv')

In this example, prometheus_flask_exporter would start a duration timer via Flask.before_request and then record the duration via Flask.after_request. When after_request is invoked, the actual response bytes haven't been generated or sent.

I wonder if measuring via Flask.teardown_request along with stream_with_context() would work, but not sure.

Thoughts appreciated!

rycus86 / prometheus_flask_exporter

Streaming responses and latency #143