Open trask opened 1 year ago
Logs, metrics, and traces should share common attributes, but most of the attributes can only be applicable to logs given they're verbose, sometimes duplicate information, and require additional logic/resources to populate them.
We should either bring attributes which are not used for traces/metrics as log-only attributes (initially) or make them opt-in by default.
Some initial thoughts after discussing in HTTP semconv stability WG today:
url.fragment
- this is missing today and we will add it HTTP semconvurl.password
and url.username
- we could add these to HTTP semconv as Opt-In attributes (maybe also as deprecated? since these url components have been deprecated)url.extension
- since this can be derived from url.path
/ url.full
, we probably wouldn't want to capture it by default, but it could be added to HTTP semconv as an Opt-In attributeurl.domain
and url.port
- HTTP semconv already captures these in server.domain
and server.port
(#3402). We would like to avoid multiple ways to capture the same thing since to make life easier on backends and queries/alerting. Some options: pull these over when they are needed for another semantic convention other than HTTP, or pull them in now as Opt-In attributes for HTTP semconv (as duplicates of server.domain
and server.port
in the context of HTTP semconv)url.registered_domain
, url.subdomain
, url.top_level_domain
- since these can be derived from url.domain
, we probably wouldn't want to capture them by default, but they could be added to HTTP semconv as Opt-In attributesurl.original
- I'm not sure what to do with this attribute. We can already see having both url.full
and url.original
is confusing. And we would prefer to force http client instrumentation to construct url.full
and http server instrumentation to split out url.path
/url.query
instead.@trask Thanks for creating this separate thread to discuss this!
I think the above really depends a lot on how we see the semantic conventions, i.e. in a very concrete context / scope (such as HTTP spans) or rather generic and signal agnostic. I know it's quite a fundamental question / topic, but I feel discussing it would help ECS alignment efforts going forward.
First of all, I think that one of the biggest values of the semantic conventions is the correlation of different data points and signals (especially with OTel, apart from tracing, expanding more and more into metrics, logs and other signals). So I fully agree with @tigrannajaryan 's comment, i.e. different signals (especially traces and logs) should share as many semantic conventions as possible (unless there are good reasons not to do, such as cardinality for metrics, etc.). Therefore, IMHO per default we should consider attributes and and their namespaces as signal agnostic (and rather do exceptions for attributes that should not apply to certain signals, such as metrics, or should apply only to a specific signal).
For example, in your comment above:
url.domain and url.port - HTTP semconv already captures these in server.domain and server.port (https://github.com/open-telemetry/opentelemetry-specification/pull/3402). We would like to avoid multiple ways to capture the same thing since to make life easier on backends and queries/alerting. Some options: pull these over when they are needed for another semantic convention other than HTTP, or pull them in now as Opt-In attributes for HTTP semconv (as duplicates of server.domain and server.port in the context of HTTP semconv)
This is reasonable when considering just HTTP spans as the scope. But, in some logging and security use cases mixing up url.*
and server.*
attributes might be confusing or even wrong. As the namespace implies, all the url.*
attributes basically describe in detail a URL string. While server.*
fields describe the responder in a network connection. In case of a non-HTTP connection, url.*
might be not applicable, while server.*
is.
Happy to discuss this topic in more detail!
I like the proposal of using the Opt-In
option when modeling redundancy, though!
We should either bring attributes which are not used for traces/metrics as log-only attributes (initially) or make them opt-in by default.
Yes, sure they should be opt-in, no reason to make these additional attributes required. I maintain that we should aim for uniformness of conventions for logs and traces (exceptions from this need an explanation). Traces don't have to use these attributes, they are just optional attributes that can be used if the user choses so.
we discussed in HTTP semconv WG meeting today:
I'm removing HTTP semconv stability blocker tag from this issue (~I'll open a separate issue for url.fragment
and tag that as an HTTP semconv stability blocker~ https://github.com/open-telemetry/opentelemetry-specification/pull/3355#issuecomment-1515438316).
Consider pulling in some or all of:
url.domain
url.extension
url.fragment
url.original
url.password
url.port
url.registered_domain
url.subdomain
url.top_level_domain
url.username
Based on @AlexanderWert's https://github.com/open-telemetry/opentelemetry-specification/pull/3355#issuecomment-1510995857
And @tigrannajaryan's https://github.com/open-telemetry/opentelemetry-specification/pull/3355#issuecomment-1511625124: