open-telemetry / opentelemetry-specification

Specifications for OpenTelemetry
https://opentelemetry.io
Apache License 2.0
3.73k stars 888 forks source link

Should language SDK generate service.instance.id? #3136

Closed jack-berg closed 5 months ago

jack-berg commented 1 year ago

The resource semantic conventions say the following about service.instance.id:

If the service has no inherent unique ID that can be used as the value of this attribute it is recommended to generate a random Version 1 or Version 4 RFC 4122 UUID (services aiming for reproducible UUIDs may also use Version 5, see RFC 4122 for more recommendations).

The language is ambiguous as it's not clear "what" / "who" is supposed to generate the random Version 1 or Version 4 UUID. Is it the application owner? Is it the language SDK?

There's a request in opentelemetry-java that we generate service.instance.id. We discussed adding it in the Java SIG and there was debate about whether this was actually the intent of the spec, and if it is, whether its a good idea. The argument against including it can probably be summarized as:

Maybe I'm missing part of the debate but in any case it would be good if the spec could clear up this ambiguity.

This may be a duplicate of #1034, but that issue's description appears to be asking about alternates for service.instance.id where I'm asking about what is specifically responsible for generating service.instance.id in the current wording of the spec.

svrnm commented 1 year ago

Adding my 2 cents to that:

I would prefer a solution where the SDK is providing the service.instance.id if it is not already defined by the end user (via environment variable, config or code). It's not something most end-users care about until it comes to observability where they want to say "give me all data for that one particular service instance" and then they figure out they have no unique identifier...

I also don't think it should be persistent across restarts, here's my rational for that: Right now if I have a highly scaled service with lots of nodes I can not identify which service is the troublemaker, except through other IDs that might be available if running on K8s (pod.id), some container runtime (container.id*) or a physical host (host.id), etc., but it's not their purpose to identify a service, which brings us back to a restart: if I (soft) restart a service within a container, the pod&container&host ID stays the same: So in the worst case I have a service that restarted a few times, but I can not uniquely identify an instance.

Oberon00 commented 1 year ago

This may be a duplicate of https://github.com/open-telemetry/opentelemetry-specification/issues/1034, but that issue's description appears to be asking about alternates for service.instance.id where I'm asking about what is specifically responsible for generating service.instance.id in the current wording of the spec.

I think that is mostly what the discussion on #1034 is about. Repeating my comment from there: https://github.com/open-telemetry/opentelemetry-specification/issues/1034#issuecomment-745262092:

IMHO this attribute is poorly defined right now as it may or may not be the same across service restarts, which IMHO can make quite a difference. It would be easiest if it MUST be the different for each restart, that way it could be used as primary key for all resources (not only service.*) sent by the same telemetry instance. On the other hand, maybe such an attribute would better be named telemetry.sdk.instance.id.

tsloughter commented 1 year ago

As a datapoint, in the Erlang SDK we use the node name as the instance id resource attribute. If the node is run without distributed erlang enabled (so no nodename) we simply use a random integer as the id -- same as using the default otel trace id generator.

nodename is somename@IP or somename@hostname, so stays the same between restarts if the node is on the same ip/hostname.

sirzooro commented 1 year ago

I also don't think it should be persistent across restarts, here's my rational for that: Right now if I have a highly scaled service with lots of nodes I can not identify which service is the troublemaker, except through other IDs that might be available if running on K8s (pod.id), some container runtime (container.id*) or a physical host (host.id), etc., but it's not their purpose to identify a service, which brings us back to a restart: if I (soft) restart a service within a container, the pod&container&host ID stays the same: So in the worst case I have a service that restarted a few times, but I can not uniquely identify an instance.

I have opposite use case: I have exactly 2 instances of service X and 10 of Y. These instances are numbered (1-2, 1-10) and their numbers are statically assigned to them during deployment. It makes sense to put them in service.instance.id as-is, and use resource attributes from Kubernetes semantic conventions to store other details.

Oberon00 commented 1 year ago

I think both an ID that persists over restarts and one that doesn't can make sense to have, for different use cases. Also, the former is harder to impalement while the latter is trivial. There should probably be two different attributes here.

svrnm commented 1 year ago

I think both an ID that persists over restarts and one that doesn't can make sense to have, for different use cases. +1

Agreed, there should be 2 different attributes, the questions remains which one is then "service.instance.id" and which one is something else?

jsuereth commented 1 year ago

I'm actually in favor of having the SDK generate this. The main concern, and others have raised this, is that we don't have a good specification for what to generate. I'm actually looking to tackle that and would be happy to brainstorm with you on what we could do.

jsuereth commented 1 year ago

Copying this here from #3202.

My requirements for a solution to this bug:

jsuereth commented 1 year ago

Ok, my proposal - Looking for feedback:

https://docs.google.com/document/d/1BenPf9vsZHCf4JpHWGQBydKZAA4XdH38wuMD7JQnz9A/edit?usp=sharing

jmacd commented 1 year ago

@jsuereth I think your proposal looks good.

jsuereth commented 1 year ago

Thanks @jmacd. I'll PR-ify it today/tomorrow.

spedersen-emailage commented 1 year ago

Wanted to revive this. A colleague found this issue as we were researching the use of service.instance.id and the best way to generate one. Has there been any other movement on this question?

tigrannajaryan commented 1 year ago

Additionally we need to make it clear that later elements of collection pipeline may override the service.instance.id if they have a better value for it.

This is for example is the case in resourcedetection processor in the Collector.

Oberon00 commented 1 year ago

How do the later components know their value is better? Would there be any way to distinguish a user-defined instance.id from an SDK-generated one? Should there be?

tigrannajaryan commented 1 year ago

How do the later components know their value is better?

Judgement call by the component developer or by the end user. If you are certain the data you have is more "correct" than what is coming from the previous component then you overwrite it. In the Collector this is end-user configurable (see for example).

Would there be any way to distinguish a user-defined instance.id from an SDK-generated one? Should there be?

I don't expect there to be a way.

dyladan commented 5 months ago

@jack-berg is this resolved by https://github.com/open-telemetry/semantic-conventions/pull/312?

jack-berg commented 5 months ago

Yes! Closing.