open-telemetry / opentelemetry-specification

Specifications for OpenTelemetry
https://opentelemetry.io
Apache License 2.0
3.75k stars 888 forks source link

Guidance for filling (or consider removing?) service.instance.id #1034

Closed anuraaga closed 4 months ago

anuraaga commented 4 years ago

Resource defines service.instance.id as a unique ID among horizontally scaled services

https://github.com/open-telemetry/opentelemetry-specification/tree/master/specification/resource/semantic_conventions#service

But we have many similar IDs in other places host.id k8s.pod.name or uid container.id faas.instance

Should one of these be copied into service.instance.id when it's available? One issue is ordering, host.id is the instance ID for non-container workloads but not container workloads, for example.

Or can we just remove service.instance.id since we would expect an ID of horizontal scaling to be present in another convention?

Oberon00 commented 4 years ago

A host can host multiple services. A POD might host multiple (related) services. A container might host multiple services (even though that's unusual). I can't think of a scenario wher FaaS would have more than one service instance.

I agree though that most of the time, except for host, everything should map 1:1 to service.instance.id, so that would be reasonable choices.

However, since only the triplet service.(namespace, name, id) is supposed to be unique, I think that we should not auto-fill service.instance.id by default with anything -- it seems to be predestined for an "inherent unique ID" as the spec puts it. I don't think we should remove it, it is fine for that use case.

I think it may make a lot of sense to add a telemetry.instance_id to be able to group resources that came from the very same initialization of OpenTelemetry, i.e. something that is usable as primary key for resources. Note that the service namespace/name/id triplet is not usable for that as it stays (or should stay at least) the same over restarts of the same service instance.

anuraaga commented 4 years ago

is not usable for that as it stays (or should stay at least) the same over restarts of the same service instance.

Hmm - in that case should we consider removing the recommendation to "generate a random Version 1 or Version 4 RFC 4122 UUID"? I don't think it can ever be possible for the UUID to be the same over restarts. I guess a user needs to set it appropriately in that case.

Oberon00 commented 4 years ago

Spec says:

It is preferable for the ID to be persistent and stay the same for the lifetime of the service instance, however it is acceptable that the ID is ephemeral and changes during important lifetime events for the service (e.g. service restarts).

And also later:

services aiming for reproducible UUIDs may also use Version 5, see RFC 4122 for more recommendations

It would be cool if the spec could also write the "nicknames" for these versions, e.g. V 5 requires a name and namespace as input, e.g. Python's implementation has this:

uuid.uuid5(namespace, name) Generate a UUID based on the SHA-1 hash of a namespace identifier (which is a UUID) and a name (which is a string).

anuraaga commented 4 years ago

I was thinking a bit more on this - @Oberon00 just wondering, how would a backend use this value with the current definition? Isn't it a bit too ambiguous? If the value is always a useful, persistent identifier for the instance among a group of them, that's useful, but if we also allow a fallback to a random UUID V1 / V5 then isn't the attribute too unreliable to do anything meaningful?

Oberon00 commented 4 years ago

I don't think the current definition is suitable for using it much generically. It probably requires user-side knowledge to interpret. Still, grouping by instance ID groups at least groups all the spans by one running OpenTelemetry SDK instance together.

andrewhsu commented 4 years ago

from issue triage mtg today, setting as p3 for editorial change if it comes in before ga

Oberon00 commented 3 years ago

Since service.instance.id is "required" in the service resource, this issue does not interact well with #1269 "Define the fallback case for service.name": If the SDK sets a value for service.name automatically, service.instance.id may be missing, resulting in an invalid resource produced by default. See https://github.com/open-telemetry/opentelemetry-specification/pull/1269#discussion_r540765852

carlosalberto commented 3 years ago

Changed priority to P2 to have this issue as a follow up.

Oberon00 commented 3 years ago

Repeating my comment from https://github.com/open-telemetry/opentelemetry-specification/pull/1269#discussion_r540765852

IMHO this attribute is poorly defined right now as it may or may not be the same across service restarts, which IMHO can make quite a difference. It would be easiest if it MUST be the different for each restart, that way it could be used as primary key for all resources (not only service.*) sent by the same telemetry instance. On the other hand, maybe such an attribute would better be named telemetry.sdk.instance.id.

(BTW, why instance.id instead of instance_id?)


To sum it up, I think we should split this attribute up into (names are just ideas):

carlosalberto commented 3 years ago

To sum it up, I think we should split this attribute up into (names are just ideas)

Thanks @Oberon00, I think this makes sense. Not sure on the names themselves but the idea of having two attributes (one optional, set by the user, and another one generated per-restart by the SDK) seems like the way to go here.

aabmass commented 3 years ago

@Oberon00 @anuraaga any updates on this issue? Looking at what we have in the Python SDK right now, the default behavior violates the requirement that "service.namespace,service.name,service.instance.id triplet MUST be globally unique".

We are prototyping metrics and this presents the problem that multiple instances of a service are likely to violate the single-writer requirement because of the identical resources. I was considering doing the same as open-telemetry/opentelemetry-java#1726 to solve this, but it doesn't seem like we are settled on a solution.

Oberon00 commented 3 years ago

I don't really have much stake in this issue (anymore). I think it would be best if you create a PR since you seem to have an actual use case.

svrnm commented 2 years ago

I stumbled across this issue recently as well, so I would be curious to get this conversation re-started.

A few thoughts:

Reg. service.deployment_id. That's an interesting addition. However, I am wondering if this should be part of Deployment, i.e. deployment.id or similar? Or would it make more sense to add deployment.environment to service, i.e. service.deployment.environment and service.deployment.id?

Rg. telemetry.sdk.instance_id: would it be possible to reload/replace the SDK at runtime? Let's say I use an auto instrumentation agent and make a hot update? Would that id change in that case?

mtwo commented 4 months ago

This appears to be closed by https://github.com/open-telemetry/semantic-conventions/pull/312. Please re-open if this isn't correct!