Closed anuraaga closed 4 months ago
A host can host multiple services. A POD might host multiple (related) services. A container might host multiple services (even though that's unusual). I can't think of a scenario wher FaaS would have more than one service instance.
I agree though that most of the time, except for host, everything should map 1:1 to service.instance.id, so that would be reasonable choices.
However, since only the triplet service.(namespace, name, id) is supposed to be unique, I think that we should not auto-fill service.instance.id by default with anything -- it seems to be predestined for an "inherent unique ID" as the spec puts it. I don't think we should remove it, it is fine for that use case.
I think it may make a lot of sense to add a telemetry.instance_id
to be able to group resources that came from the very same initialization of OpenTelemetry, i.e. something that is usable as primary key for resources. Note that the service namespace/name/id triplet is not usable for that as it stays (or should stay at least) the same over restarts of the same service instance.
is not usable for that as it stays (or should stay at least) the same over restarts of the same service instance.
Hmm - in that case should we consider removing the recommendation to "generate a random Version 1 or Version 4 RFC 4122 UUID"? I don't think it can ever be possible for the UUID to be the same over restarts. I guess a user needs to set it appropriately in that case.
Spec says:
It is preferable for the ID to be persistent and stay the same for the lifetime of the service instance, however it is acceptable that the ID is ephemeral and changes during important lifetime events for the service (e.g. service restarts).
And also later:
services aiming for reproducible UUIDs may also use Version 5, see RFC 4122 for more recommendations
It would be cool if the spec could also write the "nicknames" for these versions, e.g. V 5 requires a name and namespace as input, e.g. Python's implementation has this:
uuid.uuid5(namespace, name)
Generate a UUID based on the SHA-1 hash of a namespace identifier (which is a UUID) and a name (which is a string).
I was thinking a bit more on this - @Oberon00 just wondering, how would a backend use this value with the current definition? Isn't it a bit too ambiguous? If the value is always a useful, persistent identifier for the instance among a group of them, that's useful, but if we also allow a fallback to a random UUID V1 / V5 then isn't the attribute too unreliable to do anything meaningful?
I don't think the current definition is suitable for using it much generically. It probably requires user-side knowledge to interpret. Still, grouping by instance ID groups at least groups all the spans by one running OpenTelemetry SDK instance together.
from issue triage mtg today, setting as p3 for editorial change if it comes in before ga
Since service.instance.id is "required" in the service resource, this issue does not interact well with #1269 "Define the fallback case for service.name": If the SDK sets a value for service.name automatically, service.instance.id may be missing, resulting in an invalid resource produced by default. See https://github.com/open-telemetry/opentelemetry-specification/pull/1269#discussion_r540765852
Changed priority to P2 to have this issue as a follow up.
Repeating my comment from https://github.com/open-telemetry/opentelemetry-specification/pull/1269#discussion_r540765852
IMHO this attribute is poorly defined right now as it may or may not be the same across service restarts, which IMHO can make quite a difference. It would be easiest if it MUST be the different for each restart, that way it could be used as primary key for all resources (not only service.*) sent by the same telemetry instance. On the other hand, maybe such an attribute would better be named telemetry.sdk.instance.id.
(BTW, why instance.id instead of instance_id?)
To sum it up, I think we should split this attribute up into (names are just ideas):
To sum it up, I think we should split this attribute up into (names are just ideas)
Thanks @Oberon00, I think this makes sense. Not sure on the names themselves but the idea of having two attributes (one optional, set by the user, and another one generated per-restart by the SDK) seems like the way to go here.
@Oberon00 @anuraaga any updates on this issue? Looking at what we have in the Python SDK right now, the default behavior violates the requirement that "service.namespace,service.name,service.instance.id
triplet MUST be globally unique".
We are prototyping metrics and this presents the problem that multiple instances of a service are likely to violate the single-writer requirement because of the identical resources. I was considering doing the same as open-telemetry/opentelemetry-java#1726 to solve this, but it doesn't seem like we are settled on a solution.
I don't really have much stake in this issue (anymore). I think it would be best if you create a PR since you seem to have an actual use case.
I stumbled across this issue recently as well, so I would be curious to get this conversation re-started.
A few thoughts:
Reg. service.deployment_id. That's an interesting addition. However, I am wondering if this should be part of Deployment, i.e. deployment.id or similar? Or would it make more sense to add deployment.environment
to service, i.e. service.deployment.environment
and service.deployment.id
?
Rg. telemetry.sdk.instance_id: would it be possible to reload/replace the SDK at runtime? Let's say I use an auto instrumentation agent and make a hot update? Would that id change in that case?
This appears to be closed by https://github.com/open-telemetry/semantic-conventions/pull/312. Please re-open if this isn't correct!
Resource defines
service.instance.id
as a unique ID among horizontally scaled serviceshttps://github.com/open-telemetry/opentelemetry-specification/tree/master/specification/resource/semantic_conventions#service
But we have many similar IDs in other places
host.id
k8s.pod.name
oruid
container.id
faas.instance
Should one of these be copied into
service.instance.id
when it's available? One issue is ordering,host.id
is the instance ID for non-container workloads but not container workloads, for example.Or can we just remove
service.instance.id
since we would expect an ID of horizontal scaling to be present in another convention?