open-telemetry / semantic-conventions

Defines standards for generating consistent, accessible telemetry across a variety of domains
Apache License 2.0
256 stars 165 forks source link

[cloud provider] `host.id` semantics are too broad #739

Open mx-psi opened 7 months ago

mx-psi commented 7 months ago

host.id is currently used as a catch-all convention for any sort of ID in cloud providers or machines alike, this makes it difficult to use by vendors to retrieve specific cloud provider IDs.

Currently, a single host will have a single value for host.id; in certain environments you can rely on other cloud. attributes like cloud.platform to understand the specific value within host.id. For example, if cloud.platform is aws_ec2, then implictly this ensures that host.id, if present, will have the AWS EC2 instance id.

Proposals like #576 make it so a single host may have multiple possible values for host.id; this makes it impossible for a vendor to identify the actual meaning of host.id.

Within the OpenTelemetry Github org, these are the current values for host.id other than machine-id:

A solution for this is introducing semantic conventions that are specific to a given cloud provider. For example, we currently have gcp.gce.instance.name and #600 proposes a similar convention for AWS EC2.

ChrsMark commented 6 months ago

cc @open-telemetry/semconv-system-approvers

AfreasF5 commented 3 months ago

The challenge with any ID is that an ID is only truly usable in a specific context. Given the existence of a host in multiple contexts. Should we look at host.id as a graph and the variations of the ids as a node in a graph that has a graph id that is shared across the contexts.

mx-psi commented 2 months ago

This is something that the Entities SIG should look into before we make progress