Proposal - HTTP request count semantic convention

nerdondon commented 1 year ago

What are you trying to achieve?

Add instruments for HTTP client and server request counts.

Additional context.

The current HTTP semantic conventions only has an instrument for active requests (http.server.active_requests). This proposal is to add counters like http.server.request.count and http.client.request.count. This seemed to be part of the original PR for HTTP semantic conventions as http.{type}.requests but was lost somehow. I think this is a prime candidate to be codified as a semantic convention because of very similar instrumentation across service meshes:

This metric is also important for capturing QPS and deriving error rate.

If this is desirable, I think I would be willing to take a crack at adding it.

mateuszrzeszutek commented 1 year ago

The http.server.duration metric is a histogram, and histograms capture the total count of observations. Does that not work for you?

tsloughter commented 1 year ago

An count aggregator would be useful for instances like this.

I've been meaning to send a PR to suggest removing this from the metrics SDK spec:

Customize the aggregation - if the default aggregation associated with the Instrument does not meet the needs of the user. For example, an HTTP client library might expose HTTP client request duration as Histogram by default, but the application developer might only want the total count of outgoing requests.

Because it is not currently supported to aggregate to just a count -- I guess except for a 1 bucket histogram, which isn't really intuitive or what this is really saying.

nerdondon commented 1 year ago

The http.server.duration metric is a histogram, and histograms capture the total count of observations. Does that not work for you?

@mateuszrzeszutek that seems be fine if not somewhat non-ergonomic as @tsloughter is pointing out. A developer in this case would have to report histogram measurements exactly or override and use a 1 bucket histogram.

@tsloughter I'm not sure what the exact issue is regarding supporting aggregation to a count though. Can you elaborate? I thought the histogram should just expose a property count that represents the total population of points.

Edit: Also, wanted to ask what it would look like for a client that wanted to report different sets of attributes between duration and count. Would they have to keep two different histogram measures of duration? It still seems more ergonomic that there is a dedicated count instrument.

Additionally, from the telemetry consumption side, it seems that there should be a defined count instrument. That way, clients claiming compliance with HTTP semantic conventions would need to provide a request count instrument.

cc: @jsuereth

tsloughter commented 1 year ago

@nerdondon no issue, I think a count aggregator should be proposed. I guess my side note about the spec muddied that :)

nerdondon commented 1 year ago

@SergeyKanzhelev or @jsuereth sorry for the bump but i just wanted to see ask if there was something I could do to get more movement on this?

chameleon82 commented 1 year ago

In my preference to name that metric as rate: http.server.rate, http.client.rate

As user I'm interested in current successful rate and response rate. Means every metric should send both values one with tag status=success and with tag status=fail corresponding to both Success Rate and Error Rate metrics I can visualize with tools like Grafana.

as an alternative naming can follow http.{server|client}.rate.{success|error}

However definition of the error may be vary ( timeouts / 5xx errors or status >= 400 ) and must be specified as well.

Example of metrics with grafana tooling: https://grafana.com/grafana/plugins/novatec-sdg-panel/

Response Time            | in_timesum
Request Rate             | in_count
Error Rate               | error_in
Response Time (Outgoing) | out_timesum
Request Rate (Outgoing)  | out_count
Error Rate (Outgoing)    | error_out

nerdondon commented 1 year ago

In my understanding of the current structure of semantic conventions, a proposal would involve the standardization of a particular instrument. In the case of my proposed the count, this is just a counter from the OTel API. I'm not sure what that would look like with a rate. In any case, rate is an aggregation over the instrument. Having the base instrument would allow other aggregations as desired. Also, note the prior art that I linked with regard to this proposal. Using a count, would enable an easier path to adoption of the conventions.

I don't want to muddy the waters here by bringing in a discussion on an attribute for status class (there's already another issue regarding that IIRC). This is specifically about an operations instrument count instrument that can be used to fulfill part of the use case you mentioned and others.

tsloughter commented 1 year ago

I don't think an instrument is needed, only a Count aggregation.

RangelReale commented 1 year ago

+1 for this, I'm using Datadog and I'm seeing no way of extracting the count of the duration to do this metric.

trask commented 1 year ago

+1 for this, I'm using Datadog and I'm seeing no way of extracting the count of the duration to do this metric.

it may be worth asking Datadog about this, since I believe other backends are getting the request count from the http.server.duration metric https://github.com/open-telemetry/semantic-conventions/issues/1362

chameleon82 commented 2 months ago

With current conventions most of our concerns are solved with:	Metric	PromQL example
Latency, P99	`histogram_quantile(0.99, http.server.request.duration{})`
Request Rate	`count(http.server.request.duration{})`
Rate Increase	`rate(http.server.request.duration{})`
Error Rate	`(count(http.server.request.duration{ http.response.status_code =~ "5.*"}) or vector(0)) / count(http.server.request.duration})`
Inflight requests	`http.server.active_requests{}`

I think it worth to have some documentation on how OTEL metric can be converted to operational metrics

open-telemetry / semantic-conventions

Proposal - HTTP request count semantic convention #1362