open-telemetry / semantic-conventions

Defines standards for generating consistent, accessible telemetry across a variety of domains
Apache License 2.0
244 stars 158 forks source link

Consider `http.status_code_class` attribute #1056

Open alanwest opened 1 year ago

alanwest commented 1 year ago

Inspired by https://github.com/open-telemetry/opentelemetry-specification/pull/2943#issuecomment-1312400189.

because metrics are sensitive to cardinality, I've seen instrumentations using strings like 4xx, 5xx for status code.

Proposal is to add a new attribute for grouping status codes by class, i.e., 1xx, 2xx, 3xx, etc. See: https://datatracker.ietf.org/doc/html/rfc9110#section-15.

Open questions:

codeboten commented 1 year ago

Should the value of the attribute be 1xx, 2xx, etc OR informational, successful?

I would prefer grouping starting with the first digit of the status code class.

The httpcheck receiver in the collector currently reports http.status_class as an attribute on some metrics it produces https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/httpcheckreceiver#http-check-receiver

joaopgrassi commented 1 year ago

For metrics, is this attribute conditionally required if status is present?

I guess that would make more sense than making status_code optional, which I'm not sure would even be possible since it is conditionally required today. I'd also think the value being the first digit is a more "natural" solution.

carlosalberto commented 1 year ago

Hey @alanwest - I suggest you bring it to the Semantic Conventions SIG (every Monday after the maintainers meeting).

trask commented 1 year ago

@alanwest @codeboten @joaopgrassi please let us know if you think this should be considered as a blocker for HTTP semantic convention stability? my current assumption is that http.status_code is low-cardinality enough for most usage, and http.status_code_class could be added later as an opt-in attribute (and users of that attribute could opt-out of of http.status_code)

alanwest commented 1 year ago

please let us know if you think this should be considered as a blocker for HTTP semantic convention stability?

@trask I am not inclined to consider this a blocker for stability. I opened this issue based on https://github.com/open-telemetry/opentelemetry-specification/pull/2943#issuecomment-1312400189.

@yurishkuro do you feel this should be a blocker?

edit: just to expand a bit on why I do not think this is a blocker.. I personally think that a good default behavior is to include the full status code for metrics. For users who do experience cardinality issues due to status code, instrumentation might be able to be configured to turn off http.status_code and enable http.status_code_class. In this way, I believe http.status_code_class could be an additive, non-breaking change in the future.

Oberon00 commented 1 year ago

What is our general stance one such "trivially derivable" attributes? The http.status_code_class attribute will always be redundant with the (required) http.status_code. It seems conceptually useless to report this from the agent/client side already, but it might be useful in processing further down the line (maybe as soon as in some aggregating exporter in-process).

I feel like this needs some more general discussion

trask commented 1 year ago

I personally think that a good default behavior is to include the full status code for metrics. For users who do experience cardinality issues due to status code, instrumentation might be able to be configured to turn off http.status_code and enable http.status_code_class. In this way, I believe http.status_code_class could be an additive, non-breaking change in the future.

my thought also 👍

@Oberon00 do you think this needs more discussion before removing it as a blocker for HTTP semconv stability?

Oberon00 commented 1 year ago

Offering status_code_class as an alternative to status_code implies making status_code non-required (relaxing the condition). Wouldn't that be a breaking change? Just one example: backends may rely on presence of status_code to detect network errors, as discussed in open-telemetry/opentelemetry-specification#3253

trask commented 1 year ago

Wouldn't that be a breaking change?

ya, makes sense, as soon as you opt-out of http.status_code you would no longer be producing HTTP semconv compliant telemetry, which means we wouldn't be able to define this kind of optional behavior in the HTTP semconv spec.

I added a comment https://github.com/open-telemetry/opentelemetry-specification/pull/3225#discussion_r1122619001

I see a couple of options:

trask commented 1 year ago

We discussed this in the HTTP semconv meeting today.

I have sent open-telemetry/opentelemetry-specification#3289 to clarify that

Attribute requirement levels apply to instrumentation. Because users can transform their telemetry in a number of ways (e.g. metric views, span processors, and collector transformations), these requirement levels cannot be relied on by telemetry consumers.

I think this gives the flexibility to make something like http.status_code_class Opt-In (and http.status_code Opt-Out) in the future.