Open mlunadia opened 9 months ago
How would cloud.instance.id
, cloud.instance.name
, and cloud.machine.type
relate to host.id
, host.name
, and host.type
?
The description of host.type
currently says:
For Cloud, this must be the machine type.
Good points @pyohannes, due to the plain field structure in ECS, it made sense to add them all but as the below pairs might be mutually exclusive we can consider removing them from the PR.
cloud.instance.id
- host.id
cloud.instance.name
- host.name
cloud.machine.type
- host.type
cc: @mx-psi @ChrsMark @frzifus @dineshg13 @braydonk who worked on the system semantic conventions for comment.
Agree! Thanks @pyohannes! We are already using host.id
and host.name
in our cloud provider monitoring solutions. host.type
also makes total sense in this case!
I think removing cloud.instance.id
, cloud.instance.name
, and cloud.machine.type
from the list makes sense
This seems to also be relevant to https://github.com/open-telemetry/semantic-conventions/pull/576, https://github.com/open-telemetry/semantic-conventions/issues/739 and https://github.com/open-telemetry/semantic-conventions/pull/600
In general I like the idea of re-using the host.*
attributes but on the other hand I find it difficult to control this overloading approach.
For example we already have the gcp.gce.instance.name
but how is this different to host.name
?
If there is need to have specific attributes per provider then it would be more future proof to have a unified one called cloud.instance.name
right?
Also we should be very specific on how we leverage the resource hierarchy here.
For example in a Kubernetes world environment, host.name
can take 3 different values depending on the Collector config:
processors:
resourcedetection/system:
detectors: [ "system" ]
system:
hostname_sources: [ "lookup", "cname", "dns", "os" ]
resource_attributes:
host.name:
enabled: true
resourcedetection/gcp:
detectors: [ env, gcp ]
timeout: 2s
override: false
a) if we add the gcp
resource detector with override: true
it will be the name of the GCP machine.
b) If we run the Collector as Pod with hostNetwork: true
then the value is the name of the k8s Node (==GCP node) + the dnsdomainname
.
c) If we run the Collector as Pod with hostNetwork: false
then the value is the name of the Pod.
This can be very confusing for the users, specially when it comes to multi-tenant infrastructures with multiple teams running multiple Collector's instances per org/team/namespace.
So to my mind we should be very specific here and either:
1) introduce the cloud.*
specific values to ensure we don't mix things and have our users end up with misleading outcomes. This is mostly based on the idea @jsuereth proposed at https://github.com/open-telemetry/semantic-conventions/pull/600#discussion_r1506377547 if I'm not mistaken.
2) or re-use the host.*
with very strict guidance on what that means for the implementations on cloud envs.
Let me know what you folks think or if I miss something here.
What
This issue proposes adding cloud-related fields from the Elastic Common Schema (ECS) which are not in the OpenTelemetry Semantic Conventions specification for Cloud Resource Attributes.
Why
These fields provide valuable context, enabling a better understanding and analysis of application performance and behaviour across cloud environments. Analyse performance differences based on cloud configuration (e.g., account name for companies using multiple accounts, machine type to help understand related performance and cost, etc.), and better understand the impact of cloud infrastructure on application behaviour.
List of fields proposed for addition
This PR (currently closed) implements this issue.