Closed bogdandrutu closed 2 years ago
Add "attributes" to the new Scope concept.
This goes a step further than was discussed in the last spec SIG meeting. A smaller step would be to only add a "subscope_name" as an additional field to "instrumentation_library_name/version" and call the message with these 3 fields a Scope. But perhaps "attributes" field+semantic conventions for scopes is a better approach if we can come with other use-cases when recording additional attributes is necessary.
Probably it is fine to relax the need to have this as defined right now "instrumentation library name" and suggest that as a "MAY".
I think this one people disagree with because this dilutes the semantics of the existing API. I tend to agree with this position. I think we need to keep existing TracerProvider.get()
API's semantics unchanged since it is declared stable. It needs to populate instrumentation library name and version.
We can add something like TracerProvider.getScoped()
which will accept the new set of parameters (either libname/ver/subscope or just "attributes").
I like this proposal. It solves the problem posed in https://github.com/open-telemetry/oteps/issues/78.
@jmacd I wish I would have understood better your proposal back then, anyway good time now to fix that.
I generally like this approach, as a scope would give us a place to store instrumentation library as well as additional metadata. There was a lengthy discussion at the spec SIG where concerns were raised around backwards compatibility, and ways to work around incompatibilities were suggested. Is there any chance we can document what the concerns were and how they could be resolved?
I can document at least my concern:
I am concerned that expanding what the name is allowed to be will lose the information of which library actually generated a log line. Loggers can easily be named "http"
when they are monitoring HTTP requests (rather than the "@opentelemetry/instrumentation-http"
that we have for traces and metrics). If the same concept of "scope" is used with the logging, tracing, and metrics APIs this misuse will inevitably leak into the trace and metric API use.
IIRC @bogdandrutu's proposed resolution (and please bogdan, expand on this if I am misstating) was that the scope would typically be the fully qualified class name of the instrumentation class anyway, because that is how he has seen logger name use "in the wild" and he doesn't expect it to be a problem. Instrumentation libraries would use their own fully qualified class names and first party instrumentations would use their own fully qualified class names. On the call today he seemed to believe that calling the field "instrumentation library name" was confusing to users, particularly in the case where the instrumentation library and the instrumented library are the same (first party instrumentation).
I, (and I believe @Aneurysm9 although I would let him speak for himself) am not completely convinced by this. At least in JavaScript I have not seen any consistent use of logger name, and certainly would not expect fully qualified class name to be commonly used as a logger name (JavaScript doesn't even have a common idiom for globally referencing a class anyway). Some popular JS logging libraries (winston, npmlog) don't even have the concept of a logger name, and those that do offer little to no guidance as to what a logger name should be (bunyan, pino)
I do share Dan's concern that assumptions about user behavior are being made without adequate evidentiary support. I also think that, even if those assumptions are correct, this solution is overly broad and does more than it needs to do.
It seems that the current concern is that InstrumentationLibrary
as a concept does not fit some expected/traditional uses because "Library" is too large a scope for what some developers would expect to use to identify the logger they use. If a narrower scope is expected then I do not see why we need to completely upend the structure and semantics of the InstrumentationLibrary
data in OTLP or the semantics of the existing APIs.
It is proposed that:
There are few places where changes are necessary:
The OTLP protocol can be changed following a "backwards" compatible approach:
- Rename "InstrumentationLibrary" to "Scope".
- Add "attributes" to the new Scope concept.
- Deprecate the current "name"/"version" in favor of semantic conventions for these fields in the new "attributes".
With some further discussion of what this means for the existing APIs and SDKs.
Instead, I think there is a single, simple change required:
InstrumentationLibrary
Doing this, coupled with defining semantic conventions for those attributes, would allow for further narrowing the scope of the InstrumentationLibrary
to the point where it covered a single class or method or however users wished to use the flexibility it affords. This requires no changes to existing APIs or SDKs and does not break the expectations of existing users. It adds new control surfaces for users who want them and requires nothing of those that don't.
Instead, I think there is a single, simple change required:
- Add attributes to the
InstrumentationLibrary
Doing this, coupled with defining semantic conventions for those attributes, would allow for further narrowing the scope of the
InstrumentationLibrary
to the point where it covered a single class or method or however users wished to use the flexibility it affords. This requires no changes to existing APIs or SDKs and does not break the expectations of existing users. It adds new control surfaces for users who want them and requires nothing of those that don't.
Last week I thought we had agreed on a variation of this where scope was going to be added as an additional field without removing the existing instr lib name/version fields. That was fine with me and this version proposed by @Aneurysm9 is equally acceptable to me.
Adding attributes to InstrumentationLibrary
was what I had in mind last week. Perhaps I was not clear about my thoughts if the expectation was that only a single new scalar field would be added. That would seem to be a stop-gap to me and likely to lead to further requests for additional scoping fields. Using a set of attributes moves that out of the data model/API realm and into the semantic conventions where we have at least the beginnings of a mechanism for schema evolution that would make changes potentially less impactful.
Using a set of attributes moves that out of the data model/API realm and into the semantic conventions where we have at least the beginnings
Right. I think we're imagining that instrumentation library name would be deprecated in favor of instrumentation.name
, instrumentation.version
, and instrumentation.schema_url
attributes.
As for the standard logger name concept we're referring to, it seems to be a kind of component namespace. Maybe it's component.name
. For a gRPC instrumentation library, maybe, you'd have instrumentation.name=google/something/..../grpc
for the whole library and within the library you'd have loggers named for the big moving pieces of the implementation, like "channels" and "clients" and "loadbalancer", etc.
Some popular JS logging libraries (winston, npmlog) don't even have the concept of a logger name, and those that do offer little to no guidance as to what a logger name should be (bunyan, pino)
I think this deserves emphasis, fwiw. Are we being led by a Java-specific concern? When I look at the origin of java.util.logging.Logger.Name
, it comes back again that the central purpose is to have a handle by which to disable telemetry. The whole point of these names, IMO, is to say turn on verbose logging for an expression like google/something/..../grpc#loadbalancer
Adding a library name such as opentelemetry-log4j-appender
into InstrumentationLibrary.name
seems like it can help troubleshoot issues in some cases, similar to things like opentelemetry-servlet
we have on the tracing side. It would be very inconsistent if we didn't populate the same information for logs I think.
At the same time, in languages where using named loggers is common, we do expect to have many many log messages associated with a single logger name, and the name tends to be a very important aspect of triaging logs since it often is closely mapped to a component of concern, so it does seem useful to group the log messages by it within the protocol, for compression reasons if nothing else. Adding attributes
to InstrumentationLibrary
seems like a reasonable way to continue to keep InstrumentationLibrary.name
meaning the actual library and having that nesting for logger name available. Alternatively, keeping it as an attribute on the log records themselves and expecting gzip to help enough in situations where space matters also seems fine.
Thinking a bit more on servlet, I think the equivalent to logger name there would be the name of the class that implements HttpServlet
. This is currently a span attribute, so there is precedence to just let logger name be a record attribute. If pushing logger name up in some form, I guess we would expect to do something similar for a servlet name, for example.
For @jmacd
instrumentation.schema_url
schema_url
is different, it identifies the semantic conventions, so will not be a semantic convention :)
For @dyladan
Last week I thought we had agreed on a variation of this where scope was going to be added as an additional field without removing the existing instr lib name/version fields.
There are couple of problems that we have to solve:
attributes
and maybe (we can discuss that) some first class things like name
.For the "wire format" where we can rename messages and fields, we can have something like (based on the current documentation in the proto we can probably simply not doing this at all, see what is documented https://github.com/open-telemetry/opentelemetry-proto/blob/main/opentelemetry/proto/common/v1/common.proto#L77):
// Scope is a message representing the instrumentation scope
message Scope {
// An empty instrumentation library name means the name is unknown.
string **instrumentation_name** = 1; // deprecated, will be available as a semantic convention.
string **instrumentation_version** = 2; // deprecated, will be available as a semantic convention.
string name = 3; // The new relaxed "name". TBD if a special field or also a semantic convention. For me it seems important to be first class citizen.
// Set of labels that describe the instrumentation scope.
repeated opentelemetry.proto.common.v1.AttributeKeyValue attributes = 4;
}
// A collection of Metrics from a Scope within a Resource.
message ScopeMetrics {
// The Instrumentation scope information for the metrics in this message.
// If this field is not set then no Scope is known.
Scope scope = 1;
// A list of metrics that originate from an Instrumentation scope.
repeated Metric metrics = 2;
}
For the "get(string)" API, here is what is documented:
name (required): This name must identify the instrumentation library (e.g. io.opentelemetry.contrib.mongodb). If an application or library has built-in OpenTelemetry instrumentation, both Instrumented library and Instrumentation library may refer to the same library. In that scenario, the name denotes a module name or component name within that library or application. In case an invalid name (null or empty string) is specified, a working Tracer implementation MUST be returned as a fallback rather than returning null or throwing an exception, its name property SHOULD be set to an empty string, and a message reporting that the specified value is invalid SHOULD be logged. A library, implementing the OpenTelemetry API may also ignore this name and return a default instance for all calls, if it does not support "named" functionality (e.g. an implementation which is not even observability-related). A TracerProvider could also return a no-op Tracer here if application owners configure the SDK to suppress telemetry produced by this library.
I can easily make a case that ignoring the name at all is a valid case A library, implementing the OpenTelemetry API may also ignore this name and return a default instance for all calls, if it does not support "named" functionality
, I can make another case that This name must identify the instrumentation library
this is a very light requirement, I can use a class name or any string I want since it "identifies" the "instrumentation library".
If an application or library has built-in OpenTelemetry instrumentation, both Instrumented library and Instrumentation library may refer to the same library. In that scenario, the name denotes a module name or component name within that library or application.
Here it clearly allows me to use anything I want "module/class/component". I am not sure we need to debate anything based on this comment. I can clearly make a case that this does not need to be "unique" (which is not documented at all in the current text).
Besides all these glitches in the documentation that allow us to mostly change anything we want in terms of the semantics of that field, the proposal I think is to actually consolidate on the important requirement (ensure uniqueness which is not documented) and relax on the "artificial" requirement (to represent instrumentation library).
I think the important changes here are:
@Aneurysm9
Adding attributes to
InstrumentationLibrary
was what I had in mind last week. Perhaps I was not clear about my thoughts if the expectation was that only a single new scalar field would be added. That would seem to be a stop-gap to me and likely to lead to further requests for additional scoping fields. Using a set of attributes moves that out of the data model/API realm and into the semantic conventions where we have at least the beginnings of a mechanism for schema evolution that would make changes potentially less impactful.
Set of attributes is totally fine with me as long as the information isn't lost
@jmacd
Right. I think we're imagining that instrumentation library name would be deprecated in favor of
instrumentation.name
,instrumentation.version
, andinstrumentation.schema_url
attributes.
Seems reasonable % bogdan's comment about schema url
Some popular JS logging libraries (winston, npmlog) don't even have the concept of a logger name, and those that do offer little to no guidance as to what a logger name should be (bunyan, pino)
I think this deserves emphasis, fwiw. Are we being led by a Java-specific concern? When I look at the origin of
java.util.logging.Logger.Name
, it comes back again that the central purpose is to have a handle by which to disable telemetry. The whole point of these names, IMO, is to say turn on verbose logging for an expression likegoogle/something/..../grpc#loadbalancer
I don't think this is a Java specific concern. The idea of a logger name is definitely not universal, but certainly I see the usefulness of it and if we're designing a logging API I think it is a worthy inclusion. Some assumptions made about the solution may be specific to Java.
@bogdandrutu
There are couple of problems that we have to solve:
- The representation on the wire. Which we probably will end up having a set of
attributes
and maybe (we can discuss that) some first class things likename
.
Seems fine.
- What to do with the current "get(string)" API which in my opinion is the biggest part of the discussion, since it matches a lot with what lots of logging library have as an API to get a Logger.
[elided...]
Here it clearly allows me to use anything I want "module/class/component". I am not sure we need to debate anything based on this comment. I can clearly make a case that this does not need to be "unique" (which is not documented at all in the current text).
This seems like a creative interpretation to me. Sure, maybe to the letter of the rules it can be done, but removing the uniqueness and meaning from it removes a lot of its usefulness. I believe @Aneurysm9 would agree with me here, but again I'd let him speak for himself.
Besides all these glitches in the documentation that allow us to mostly change anything we want in terms of the semantics of that field, the proposal I think is to actually consolidate on the important requirement (ensure uniqueness which is not documented) and relax on the "artificial" requirement (to represent instrumentation library).
I think the important changes here are:
- Drop the requirement to be the "instrumentation library"
- Add a requirement to uniquely identify that "instrumentation class/component/library/scope".
Uniqueness is not documented because it wasn't required within a single library. Within a library you can have 3 tracers all with the same name. It just needs to not be the same as the name from any other library. Obviously global uniqueness also satisfies this requirement.
Maybe a compromise is to say the name should be globally unique, and one suggested way to do that is to use the fully qualified class name (or similar fully qualified identifier)?
side note: github issues need threads 😆
I would like to challenge the assumption that we can't have getLogger and getTracer with different signatures. If logger name is a well-established concept that's all well and good, but tracer and meter name are not likely to be familiar concepts to new users and already will require some education. I don't think we can just assume users will not be able to understand the differences here.
@dyladan
Maybe a compromise is to say the name should be globally unique, and one suggested way to do that is to use the fully qualified class name (or similar fully qualified identifier)?
+1
Uniqueness is not documented because it wasn't required within a single library. Within a library you can have 3 tracers all with the same name. It just needs to not be the same as the name from any other library. Obviously global uniqueness also satisfies this requirement.
Not sure about where you see "inside a single library" and where that scope comes from. I am saying that as it is right now I can use "io.grpc" as identifier for my "instrumentation library", it does not say to be unique.
@dyladan here is an offer that you cannot refuse :))
Here is a more crazy idea (inspired by also the fact that right now the "name", can be In that scenario, the name denotes a module name or component name within that library or application.
, I want to propose a simple "sed/InstrumentationLibrary/Instrumentation" and we are done :) And this is a serious proposal, I think the "library" part is the one that creates the most problems, and is violated even by our documentation.
@dyladan
Even in your example fully qualified class name (or similar fully qualified identifier)
the class name is not a "library". I think that word is the one that causes the most troubles and if simply remove that we can get the best of both worlds. Also it fixes our own specification since we recommend "a module name or component name within that library or application" which are not libraries.
@dyladan
Maybe a compromise is to say the name should be globally unique, and one suggested way to do that is to use the fully qualified class name (or similar fully qualified identifier)?
+1
/cc @z1c0 @arminru can we verify if this is ok for us?
Uniqueness is not documented because it wasn't required within a single library. Within a library you can have 3 tracers all with the same name. It just needs to not be the same as the name from any other library. Obviously global uniqueness also satisfies this requirement.
Not sure about where you see "inside a single library" and where that scope comes from. I am saying that as it is right now I can use "io.grpc" as identifier for my "instrumentation library", it does not say to be unique.
I was just saying that there is no uniqueness. Within the same library the name would of course always be the same and of course is not unique, but it is unique to that library. 2 different library authors would never clash names if the library name is used. This is the type of "uniqueness" that is important.
@dyladan here is an offer that you cannot refuse :))
Here is a more crazy idea (inspired by also the fact that right now the "name", can be
In that scenario, the name denotes a module name or component name within that library or application.
, I want to propose a simple "sed/InstrumentationLibrary/Instrumentation" and we are done :) And this is a serious proposal, I think the "library" part is the one that creates the most problems, and is violated even by our documentation.Even in your example fully qualified class name (or similar fully qualified identifier) the class name is not a "library". I think that word is the one that causes the most troubles and if simply remove that we can get the best of both worlds. Also it fixes our own specification since we recommend "a module name or component name within that library or application" which are not libraries.
I think relaxing the term "library" is fine as long as we say we need something fully qualified. I don't want the case where the name is just the name of the class (not fully qualified) and clashes with a different library that happens to have a class with the same name.
You're right that the "library" term isn't important to us, as long as we can uniquely identify the component which is generating the telemetry.
/cc @arminru @z1c0 @Oberon00 to make sure i didn't make any wrong claims in the above
100% I want to preserve the ability to isolate an "instrumentation" (being it class/package/library/etc) using that name, same requirements were on the bases of logs as well :)
@dyladan
2 different library authors would never clash names if the library name is used. This is the type of "uniqueness" that is important.
This is what I am trying to say that is not documented right now.
It's not documented it's just a natural consequence of using instrumentation library name
Probably it is fine to relax the need to have this as defined right now "instrumentation library name" and suggest that as a "MAY".
I think we should keep this out of this issue and defer to #586.
Maybe a compromise is to say the name should be globally unique, and one suggested way to do that is to use the fully qualified class name (or similar fully qualified identifier)?
Also not sure if this should be discussed separately from the ability to add arbitrary attributes, which I think is the main topic of this issue(?). But let's try discussing it here:
It is true that the spec right now wants you to have the actual library/artifact name. But I think this was never really followed consistently across languages (e.g. Python uses the module/file name __name__
of the module creating the tracer in many cases, which is different from the distribution/library name, correct me if I'm wrong @open-telemetry/python-maintainers).
I think the actual source / format of the name is not that important, as long as we agree on what it should be useful for, which is IMHO, identifying the technical source of some telemetry for troubleshooting telemetry-related code, usually instrumentation code (not for assigning it to some logical monitored system component -- other span and resource attributes should be used for that like service.name, http.route, code.function, etc.). Ideally, it is not too fine-grained, or if it is fine grained, I think it would make sense to recommend having a common prefix. This would allow discarding / skipping analysis of telemetry data generated by known-faulty/too noisy sources.
It is true that the spec right now wants you to have the actual library/artifact name. But I think this was never really followed consistently across languages (e.g. Python uses the module/file name
__name__
of the module creating the tracer in many cases, which is different from the distribution/library name, correct me if I'm wrong @open-telemetry/python-maintainers).
Do you mean this, @Oberon00?
If that is the case, well, that is just a value the user can pass to a method that is part of the public API. In that case, I think the spec does not "apply" since it is left as a choice for the user.
In that case, I think the spec does not "apply" since it is left as a choice for the user.
I mean, the spec can't force users to do anything, they can also write a new GUID into the span name for each span if they want, but that would break everybody who expects that the span name is what the spec says it should be.
But in that case, I think __name__
ought to be fine as "instrumentation library name".
I mean, the spec can't force users to do anything, they can also write a new GUID into the span name for each span if they want, but that would break everybody who expects that the span name is what the spec says it should be.
If we want the span name to be defined by the spec, shouldn't we add to the spec a requirement that is to be implemented by a non-user defined field somewhere?
If we want the span name to be defined by the spec, shouldn't we add to the spec a requirement that is to be implemented by a non-user defined field somewhere?
We don't want the span name to be defined by the spec, we want the meaning of the span name to be defined by the spec. The same goes for the instrumentation library name. Having a defined meaning should lead to consistency in usage and the ability to reliably make decisions based on the information it conveys rather than treating it simply as bits of data with no informational value.
This is a try to document what is the ideal scenario to extend the protocol to support the notion of "instrumentation scope", and allow the tracer/meter/logger to share the same concepts.
There were multiple proposals, disagreements, ideas in lots of places:
OTLP ideal end goal, based on lots of discussions/comments:
How to get there?
There are few places where changes are necessary: