vespa-engine / vespa

AI + Data, online. https://vespa.ai
https://vespa.ai
Apache License 2.0
5.69k stars 593 forks source link

Add metric that matches the duration in access log #26410

Open jobergum opened 1 year ago

jobergum commented 1 year ago

Due to https://github.com/vespa-engine/vespa/issues/26408, the duration logged in the access log can be considerably higher than query_latency, also because the access log duration field includes the time it takes to render the response.

I think that we should have a metric that corresponds 100% with the duration logged to the access log, bonus for percentile calculations.

yngveaasheim commented 1 year ago

Agreed, and the container.handled.latency.sum metric with the correct handler tag should be a much closer match here. There is no percentiles for this, though, but I believe we should rather use histograms for than percentiles going forwards.

bjorncs commented 1 year ago

https://github.com/vespa-engine/vespa/pull/27120 changes StatisticsSearcher to use the correct request timestamp. Previous timestamp did not account for initial request processing in Jetty.

The handled.latency includes the latency up to the start of the response, not including the time for producing and sending the response content. The metric serverTotalSuccessfulResponseLatency/serverTotalFailedResponseLatency are the only latency metrics that matches the duration from the access log.

bjorncs commented 1 year ago

Resetting priority. This is part of larger effort to evaluate all existing container metrics.