open-telemetry / semantic-conventions

Defines standards for generating consistent, accessible telemetry across a variety of domains
Apache License 2.0
275 stars 175 forks source link

[http] Publish migration plan for HTTP semconv changes. #534

Open TylerHelmuth opened 1 year ago

TylerHelmuth commented 1 year ago

I was looking through https://github.com/open-telemetry/semantic-conventions/tree/main/docs/http#semantic-conventions-for-http and I am worried that users looking to understand how the breaking changes of the semantic convention will not have a clear path to understanding. I don't spend a lot of time in the spec/semconv, but I spend a lot of time in the project and I feel confused about certain things - I am worried about users who are purely consumers of OTEL.

My current questions:

Ultimately I would love to see a page dedicated to outlining the migration steps. As a user I want to see exactly what values changed, what was dropped, what was added, etc. I'd also want to see which languages support the new values, and any important deadlines.

I feel the blog post was really good and I know a lot of details went into the changelog, but both of those locations aren't great for discoverability. Important details in release notes, blog posts, and changelogs get moved downward as time goes on, making them harder to find. I really want a permanent, dedicated, discoverable space for us to communicate this changes to our end users.

Hopefully any patterns we do now can help with any future semconv breaking changes.

trask commented 1 year ago

I think https://github.com/open-telemetry/opentelemetry.io/issues/3554 is a great idea, thanks for opening that!

Is the warning in https://github.com/open-telemetry/semantic-conventions/tree/main/docs/http#semantic-conventions-for-http still accurate?

yes (was there something that led you to think it isn't accurate, so we can fix that?)

Is the Oct 1st 2023 deadline still applicable (meaning Instrumentation Libraries are free to drop the env var support with the next stable release?

yes, Instrumentation Libraries are free to drop the env var support in the next stable release now (but still should have implemented support for it in the current major version), I just sent #541 to remove mention of Oct 1 from the warning

As a user I want to see exactly what values changed, what was dropped, what was added, etc.

this was the goal of https://opentelemetry.io/blog/2023/http-conventions-declared-stable/#summary-of-changes, are you looking for something more than that, or just better discoverability of that information (i.e. https://github.com/open-telemetry/opentelemetry.io/issues/3554)?

I'd also want to see which languages support the new values

maybe we should introduce a compliance matrix in the semantic conventions repo similar to https://github.com/open-telemetry/opentelemetry-specification/blob/main/spec-compliance-matrix.md?

any important deadlines

can you expand on what you have in mind? I'm not sure of any deadlines (other than Oct 1 which has passed now)

TylerHelmuth commented 1 year ago

yes (was there something that led you to think it isn't accurate, so we can fix that?)

My general confusion of the doc is that as a reader I'm not sure if it is applicable to me. It gives a warning about using v1.20.0 or prior, but to be able to answer that I have to know what my instrumentation is using. That's all possible to research, but it isn't something I expect the average OTel consumers to know. Ultimately it feels like a message for the SDK maintainers, and not something for an end user to consume (which is fine). But if an end user finds it I think it creates confusion. As an end user all I want to see is how the SDK I'm using is handling the breaking change.

https://opentelemetry.io/blog/2023/http-conventions-declared-stable/#summary-of-changes is what I'm looking for, but in a more permanent/discoverable place. My worry is that years down the road when some enterprise bumps their java/dotnet instrumentation from 1.0 to 2.0 they need to be able to find these changes or they'll feel pain. Of course we'd hope that they'd take advantage of the env var while in 1.0, but who knows.

maybe we should introduce a compliance matrix in the semantic conventions repo similar to https://github.com/open-telemetry/opentelemetry-specification/blob/main/spec-compliance-matrix.md?

I think this is a great idea. Listing of the version where the env var was added and then removed, and which sdks have done that, would be very helpful.

I think a combination of the blog content and a matrix is exactly what end users will need to successfully navigate this change. And once we have those docs, updating the warning section to guide end users to those docs.

joaopgrassi commented 12 months ago

I think this is a great idea. Listing of the version where the env var was added and then removed, and which sdks have done that, would be very helpful.

I think a combination of the blog content and a matrix is exactly what end users will need to successfully navigate this change. And once we have those docs, updating the warning section to guide end users to those docs.

I think that's a great start, but the real problem I see are instrumentations. SDKs have I think little to do with the support of the env variable. What really affect users are the instrumentation libraries they are using, and those won't be reflected in such matrix. It is also impracticable to list in a matrix all instrumentations in the spec/semconv repo.

Maybe what we can do is request that each SIG/contrib repo that have HTTP instrumentations to highlight such thing? Because as a developer, say working with .NET/ASP.NET I know the OTel .NET repo and if I have an overview of which version is supporting which semconv I think that's a better migration. Maybe such thing can even be part of the OTel website, under each language, like:

image

cartermp commented 12 months ago

Mmm, I don't think this needs to be per-language. I would rather lean on release notes for the library instead, which can link to a more central migration page that explains what changed and how to approach it.

TylerHelmuth commented 11 months ago

It is also impracticable to list in a matrix all instrumentations in the spec/semconv repo.

If it is impractical to present this information in a centralized matrix then we should come up with a standardized way for each instrumentation package to present this information in a clear way. I'm not sure how each instrumentation package documents themselves so I'm not sure if it is better done in repos, the otel site, or both. And no matter what I think it is still a good idea to clearly document the changes and migration strategy here and reference it.

To reiterate the importance of this problem, I spent several hours today looking through OTel js, go, java, python, ruby, and dotnet trying to answer 2 questions:

  1. What Semantic Convention version is the language's http instrumentation using?
  2. Does the languange's instrumentation support OTEL_SEMCONV_STABILITY_OPT_IN?

It was so hard to find answers to question 1. Granted I spend a majority of my time in the Collector and not the OTel languages, but I feel pretty knowledgable on most of those languages and felt I'd be able to track down answers pretty easily - I was wrong. My fear is that when users go looking for this information they will also struggle.

I see that both Java and DotNet are preparing their repos to drop support for the old HTTP semantic convention. I am very worried that if we don't solve this communication problem before that happens that we'll have a lot of upset users.

jack-berg commented 11 months ago

I see that both Java and DotNet are preparing their repos to drop support for the old HTTP semantic convention. I am very worried that if we don't solve this communication problem before that happens that we'll have a lot of upset users.

Java is preparing to publish a 2.x version which produces the new stable HTTP conventions, but will continue to support the 1.x version some undecided period of time. I think I heard > 6 months today. The 1.x version produces the old HTTP conventions with the ability to opt into the new ones.

cc @open-telemetry/java-instrumentation-maintainers

TylerHelmuth commented 11 months ago

Java is preparing to publish a 2.x version which produces the new stable HTTP conventions, but will continue to support the 1.x version some undecided period of time.

Perfect. This is the exact type of information I want to make it sooo easy for users to discover no matter if they are using apache-httpclient, or java-http-client or reactor or whatever.

I'm picking on Java because it is a trend-setter for OTel. And I don't mean to imply that the Java SIG wasn't going to share this information in the changelog/releasenotes/somewhere else. But my hope is that we solve this problem as a whole - creating a solution that will be useful for users of any language no matterthe language's current semconv version.

This is top of my mind right now so I've add this as a topic for the next semconv SIG meeting.

trask commented 11 months ago

It is also impracticable to list in a matrix all instrumentations in the spec/semconv repo.

I think a matrix of all HTTP instrumentations published under the OpenTelemetry GitHub org could be feasible, and could be useful for our own internal tracking too (the third phase of the HTTP semconv stability WG is supposed to include driving these changes across existing implementations).

I'll explore what that might look like.

The question remains for me though where it is most likely that users would look for this information, and I tend to think that the first place they would look is the release notes for the given instrumentation library(ies) they are using.

trask commented 11 months ago

I'll explore what that might look like.

thought on #571?

TylerHelmuth commented 11 months ago

The question remains for me though where it is most likely that users would look for this information, and I tend to think that the first place they would look is the release notes for the given instrumentation library(ies) they are using.

I agree that we definitely want something in release notes (and I think how it is phrased should be standardized between languages.) But since it is a lot of information to convey, I support creating content that each changelog could link back to that details the specific changes that occurred.

TylerHelmuth commented 11 months ago

thought on https://github.com/open-telemetry/semantic-conventions/pull/571?

Love it. I recognize that something like that matrix adds cognitive load to OTel maintainers, but I think it is worth it.

joaopgrassi commented 11 months ago

Looks good, but I'm just afraid of how/who will be responsible of keeping this up-to-date? Our compatibility matrix is often outdated (SIG maintainers don't always come to tick the box when they add a feature. We, in the semconv group, can't know everything that goes on in each SIG), imagine the vast amount of instrumentations. I liked the idea of having a template, that maybe is maintained by the semconv repo, but that then each instrumentation is required to add to their README somehow? I feel this way the chance of things being outdated is less.

TylerHelmuth commented 11 months ago

I liked the idea of having a template, that maybe is maintained by the semconv repo, but that then each instrumentation is required to add to their README somehow?

We should definitely do this. I view this as the bare minimum we (semconv sig) should be doing for users to ensure the release notes make it clear that the breaking changes affect end users.

TylerHelmuth commented 11 months ago

Possible Release note template:

Breaking Changes

The following {{library names}} now emit only the stable HTTP semantic conventions:

Impact

This breaking change updates these libraries to emit new attribute names, such as http.response.status_code instead of http.status_code. If you already depend on the old names of the attributes emitted by these libraries in down downstream locations, such as collector configuration, alerts, or dashboards you will be affected by this change.

What is Changing

See HTTP semantic conventions declared stable for full details.

Common attributes across HTTP client and server spans

Change Comments
http.methodhttp.request.method Now captures only 9 common HTTP methods by default (configurable) plus _OTHER
http.status_codehttp.response.status_code
http.request.header.<key> • Dash ("-") to underscore ("_") normalization in <key> has been removed
• On HTTP server spans: now must be provided to sampler
http.response.header.<key> Dash ("-") to underscore ("_") normalization in <key> has been removed
http.request_content_lengthhttp.request.body.size • Recommended → Opt-In
Not marked stable yet
http.response_content_lengthhttp.response.body.size • Recommended → Opt-In
Not marked stable yet
user_agent.original • On HTTP client spans: Recommended → Opt-In
• On HTTP server spans: now must be provided to sampler
• See note if migrating from <= v1.18.0
net.protocol.namenetwork.protocol.name Recommended → Conditionally required if not http and network.protocol.version is set
net.protocol.versionnetwork.protocol.version • Examples fixed: 2.02 and 3.03
• See note if migrating from <= v1.19.0
net.sock.family Removed
net.sock.peer.addrnetwork.peer.address On HTTP server spans: if http.client_ip was unknown, then also net.sock.peer.addrclient.address; client.address must be provided to sampler
net.sock.peer.portnetwork.peer.port Now captured even if same as server.port
net.sock.peer.name Removed
New: http.request.method_original Only captured when http.request.method is _OTHER
New: error.type

References:

HTTP client span attributes

Change Comments
http.urlurl.full
http.resend_counthttp.request.resend_count
net.peer.nameserver.address
net.peer.portserver.port Now captured even when same as default port for scheme

{.td-initial .table .table-responsive .ot-table-first-row-60}

References:

HTTP server span attributes

Change Comments
http.route No change
http.targeturl.path and url.query Split into two separate attributes
http.schemeurl.scheme Now factors in [X-Forwarded-Proto][], [Forwarded#proto][] headers
http.client_ipclient.address If http.client_ip was unknown (i.e., no [X-Forwarded-For][], [Forwarded#for][] headers), then net.sock.peer.addrclient.address; now must be provided to sampler
net.host.nameserver.address Now based only on Host, :authority, [X-Forwarded-Host][], [Forwarded#host][] headers
net.host.portserver.port Now based only on Host, :authority, [X-Forwarded-Host][X-Forwarded-Host], [Forwarded#host][] headers

References:

HTTP client and server span names

The {http.method} portion of span names is replace by HTTP when {http.method} is _OTHER.

See note if migrating from <= v1.17.0.

References:

HTTP client duration metric

Metric changes:

Attribute change Comments
http.methodhttp.request.method Now captures only 9 common HTTP methods by default plus _OTHER
http.status_codehttp.response.status_code
net.peer.nameserver.address
net.peer.portserver.port Now captured even when same as default port for scheme
net.sock.peer.addr Removed
net.protocol.namenetwork.protocol.name Recommended → Conditionally required if not http and network.protocol.version is set
net.protocol.versionnetwork.protocol.version Examples fixed: 2.02 and 3.03; see note if migrating from <= v1.19.0
New: error.type

References:

HTTP server duration metric

Metric changes:

Attribute change Comments
http.route No change
http.methodhttp.request.method Now captures only 9 common HTTP methods by default plus _OTHER
http.status_codehttp.response.status_code
http.schemeurl.scheme Now factors in [X-Forwarded-Proto span][X-Forwarded-Proto], [Forwarded#proto span][Forwarded#proto] headers
net.protocol.namenetwork.protocol.name Recommended → Conditionally required if not http and network.protocol.version is set
net.protocol.versionnetwork.protocol.version Examples fixed: 2.02 and 3.03; see note if migrating from <= v1.19.0
net.host.nameserver.address • Recommended → Opt-In (due to high-cardinality vulnerability since based on HTTP headers)
• Now based only on Host span, :authority span, [X-Forwarded-Host span][X-Forwarded-Host], [Forwarded#host span][Forwarded#host] headers
net.host.portserver.port • Recommended → Opt-In (due to high-cardinality vulnerability since based on HTTP headers)
• Now based only on Host span, :authority span, [X-Forwarded-Host span][X-Forwarded-Host], [Forwarded#host span][Forwarded#host] headers
New: error.type

References:

[Forwarded#for]: https://developer.mozilla.org/docs/Web/HTTP/Headers/Forwarded#for [Forwarded#proto]: https://developer.mozilla.org/docs/Web/HTTP/Headers/Forwarded#proto [Forwarded#host]: https://developer.mozilla.org/docs/Web/HTTP/Headers/Forwarded#host [X-Forwarded-For]: https://developer.mozilla.org/docs/Web/HTTP/Headers/X-Forwarded-For [X-Forwarded-Proto]: https://developer.mozilla.org/docs/Web/HTTP/Headers/X-Forwarded-Proto [X-Forwarded-Host]: https://developer.mozilla.org/docs/Web/HTTP/Headers/X-Forwarded-Host

trask commented 11 months ago

Please check out #612

TylerHelmuth commented 10 months ago

We discussed the impact of the http semantic convention changes to the Operator during our SIG call. The consensus was that it is worrying haha The operator adds an extra layer of obfuscation to all the changes, which makes the release notes of the languages even more important. I'd like to continue to push for a standardized template for all languages to use. I also still push for each language to make it abundantly clear exactly which libraries are affected.

For the operator specifically we've created https://github.com/open-telemetry/opentelemetry-operator/issues/2542 to track our discussion on how we'll help our users understand the impact of these changes.

TylerHelmuth commented 10 months ago

I also still push for each language to make it abundantly clear exactly which libraries are affected.

On this topic I'd very much like to revisit the idea of a matrix. We created https://docs.honeycomb.io/getting-data-in/semconv/ and would very much like to contribute it somewhere(s) meaningful.

joaopgrassi commented 10 months ago

On this topic I'd very much like to revisit the idea of a matrix. We created docs.honeycomb.io/getting-data-in/semconv and would very much like to contribute it somewhere(s) meaningful.

I think that is a great starting point to adopt/follow. But I still think this should not be placed/managed in the semconv repo, but in each SDK/Instrumentation repo.

TylerHelmuth commented 10 months ago

but in each SDK/Instrumentation repo.

I thinking about this problem more and more I agree. As an end users, I should be able to look at any changelog/release and instantly comprehend that there are breaking changes that affect me. I should also be able to look at any instrumentation's documentation and instantly understand if it is emitting stable http semantic conventions or allows the use of the OTEL_SEMCONV_STABILITY_OPT_IN env variable.

I am also in favor of a centralize location that can provide a quick guide to compatibility like the matrices we published. These could link to the more specific documentation. They could also live at some top level in the repos or on each languages doc page in the otel site. Basically I want users to see this information no matter where they go investigating the language/library.

I recognize that this adds extra maintenance to the instrumentation maintainers, but I think we owe it to our end users. We need to be able to make these important breaking changes, but we also must make it clear how it impacts users.

jsuereth commented 9 months ago

We discussed this a bit in the TC. Regarding having every language SiG provide consistent documentation on what has been broken, the following is true:

Going forward, we're going to institute a policy where semconv WG's (and semconv itself) directly notifies language SIGs (via opened issues) on breaking changes and expectations. For HTTP, e.g., this bug would include the details on what changed in semconv (from the blog post) and would ask Language SIGs to update their instrumetnation to be in compliance. We can also call out central documentation areas where languages can contribute their migration paths. Then it would be up to language SiGs to fill out information as they perform the migration.

We do not think we can centrally own the migration, instead relying on the current Specification Change -> Language/Collector SiG notification channel, so let's improve how that communication happens.