Closed jack-berg closed 5 months ago
Adding my 2 cents to that:
I would prefer a solution where the SDK is providing the service.instance.id if it is not already defined by the end user (via environment variable, config or code). It's not something most end-users care about until it comes to observability where they want to say "give me all data for that one particular service instance" and then they figure out they have no unique identifier...
I also don't think it should be persistent across restarts, here's my rational for that: Right now if I have a highly scaled service with lots of nodes I can not identify which service is the troublemaker, except through other IDs that might be available if running on K8s (pod.id), some container runtime (container.id*) or a physical host (host.id), etc., but it's not their purpose to identify a service, which brings us back to a restart: if I (soft) restart a service within a container, the pod&container&host ID stays the same: So in the worst case I have a service that restarted a few times, but I can not uniquely identify an instance.
This may be a duplicate of https://github.com/open-telemetry/opentelemetry-specification/issues/1034, but that issue's description appears to be asking about alternates for service.instance.id where I'm asking about what is specifically responsible for generating service.instance.id in the current wording of the spec.
I think that is mostly what the discussion on #1034 is about. Repeating my comment from there: https://github.com/open-telemetry/opentelemetry-specification/issues/1034#issuecomment-745262092:
IMHO this attribute is poorly defined right now as it may or may not be the same across service restarts, which IMHO can make quite a difference. It would be easiest if it MUST be the different for each restart, that way it could be used as primary key for all resources (not only service.*) sent by the same telemetry instance. On the other hand, maybe such an attribute would better be named telemetry.sdk.instance.id.
As a datapoint, in the Erlang SDK we use the node name as the instance id resource attribute. If the node is run without distributed erlang enabled (so no nodename) we simply use a random integer as the id -- same as using the default otel trace id generator.
nodename is somename@IP
or somename@hostname
, so stays the same between restarts if the node is on the same ip/hostname.
I also don't think it should be persistent across restarts, here's my rational for that: Right now if I have a highly scaled service with lots of nodes I can not identify which service is the troublemaker, except through other IDs that might be available if running on K8s (pod.id), some container runtime (container.id*) or a physical host (host.id), etc., but it's not their purpose to identify a service, which brings us back to a restart: if I (soft) restart a service within a container, the pod&container&host ID stays the same: So in the worst case I have a service that restarted a few times, but I can not uniquely identify an instance.
I have opposite use case: I have exactly 2 instances of service X and 10 of Y. These instances are numbered (1-2, 1-10) and their numbers are statically assigned to them during deployment. It makes sense to put them in service.instance.id
as-is, and use resource attributes from Kubernetes semantic conventions to store other details.
I think both an ID that persists over restarts and one that doesn't can make sense to have, for different use cases. Also, the former is harder to impalement while the latter is trivial. There should probably be two different attributes here.
I think both an ID that persists over restarts and one that doesn't can make sense to have, for different use cases. +1
Agreed, there should be 2 different attributes, the questions remains which one is then "service.instance.id" and which one is something else?
I'm actually in favor of having the SDK generate this. The main concern, and others have raised this, is that we don't have a good specification for what to generate. I'm actually looking to tackle that and would be happy to brainstorm with you on what we could do.
Copying this here from #3202.
My requirements for a solution to this bug:
Ok, my proposal - Looking for feedback:
https://docs.google.com/document/d/1BenPf9vsZHCf4JpHWGQBydKZAA4XdH38wuMD7JQnz9A/edit?usp=sharing
@jsuereth I think your proposal looks good.
Thanks @jmacd. I'll PR-ify it today/tomorrow.
Wanted to revive this. A colleague found this issue as we were researching the use of service.instance.id
and the best way to generate one. Has there been any other movement on this question?
Additionally we need to make it clear that later elements of collection pipeline may override the service.instance.id if they have a better value for it.
This is for example is the case in resourcedetection processor in the Collector.
How do the later components know their value is better? Would there be any way to distinguish a user-defined instance.id from an SDK-generated one? Should there be?
How do the later components know their value is better?
Judgement call by the component developer or by the end user. If you are certain the data you have is more "correct" than what is coming from the previous component then you overwrite it. In the Collector this is end-user configurable (see for example).
Would there be any way to distinguish a user-defined instance.id from an SDK-generated one? Should there be?
I don't expect there to be a way.
@jack-berg is this resolved by https://github.com/open-telemetry/semantic-conventions/pull/312?
Yes! Closing.
The resource semantic conventions say the following about
service.instance.id
:The language is ambiguous as it's not clear "what" / "who" is supposed to generate the random Version 1 or Version 4 UUID. Is it the application owner? Is it the language SDK?
There's a request in
opentelemetry-java
that we generate service.instance.id. We discussed adding it in the Java SIG and there was debate about whether this was actually the intent of the spec, and if it is, whether its a good idea. The argument against including it can probably be summarized as:service.instance.id
to an inherent unique ID if available.Maybe I'm missing part of the debate but in any case it would be good if the spec could clear up this ambiguity.
This may be a duplicate of #1034, but that issue's description appears to be asking about alternates for service.instance.id where I'm asking about what is specifically responsible for generating service.instance.id in the current wording of the spec.