Add missing ECS cloud fields to Semantic Conventions Cloud Resource attributes

mlunadia commented 9 months ago

What

This issue proposes adding cloud-related fields from the Elastic Common Schema (ECS) which are not in the OpenTelemetry Semantic Conventions specification for Cloud Resource Attributes.

Why

These fields provide valuable context, enabling a better understanding and analysis of application performance and behaviour across cloud environments. Analyse performance differences based on cloud configuration (e.g., account name for companies using multiple accounts, machine type to help understand related performance and cost, etc.), and better understand the impact of cloud infrastructure on application behaviour.

List of fields proposed for addition

Attribute	Type	Description	Examples
cloud.account.name	string	Cloud account name/alias	elastic-dev
cloud.instance.id	string	Instance ID	i-1234567890abcdef0
cloud.instance.name	string	Instance name	jenkins-1
cloud.machine.type	string	Machine type	t2.medium
cloud.project.id	string	Cloud project identifier	my-project
cloud.project.name	string	Cloud project name	project
cloud.service.name	string	Cloud service name	ec2

This PR (currently closed) implements this issue.

pyohannes commented 9 months ago

How would cloud.instance.id, cloud.instance.name, and cloud.machine.type relate to host.id, host.name, and host.type?

The description of host.type currently says:

For Cloud, this must be the machine type.

mlunadia commented 9 months ago

Good points @pyohannes, due to the plain field structure in ECS, it made sense to add them all but as the below pairs might be mutually exclusive we can consider removing them from the PR.

cloud.instance.id - host.id cloud.instance.name - host.name cloud.machine.type - host.type

cc: @mx-psi @ChrsMark @frzifus @dineshg13 @braydonk who worked on the system semantic conventions for comment.

kaiyan-sheng commented 8 months ago

Agree! Thanks @pyohannes! We are already using host.id and host.name in our cloud provider monitoring solutions. host.type also makes total sense in this case!

mx-psi commented 8 months ago

I think removing cloud.instance.id, cloud.instance.name, and cloud.machine.type from the list makes sense

ChrsMark commented 8 months ago

This seems to also be relevant to https://github.com/open-telemetry/semantic-conventions/pull/576, https://github.com/open-telemetry/semantic-conventions/issues/739 and https://github.com/open-telemetry/semantic-conventions/pull/600

In general I like the idea of re-using the host.* attributes but on the other hand I find it difficult to control this overloading approach.

For example we already have the gcp.gce.instance.name but how is this different to host.name? If there is need to have specific attributes per provider then it would be more future proof to have a unified one called cloud.instance.name right?

Also we should be very specific on how we leverage the resource hierarchy here. For example in a Kubernetes world environment, host.name can take 3 different values depending on the Collector config:

  processors:
    resourcedetection/system:
      detectors: [ "system" ]
      system:
        hostname_sources: [ "lookup", "cname", "dns", "os" ]
        resource_attributes:
          host.name:
            enabled: true
    resourcedetection/gcp:
      detectors: [ env, gcp ]
      timeout: 2s
      override: false

a) if we add the gcp resource detector with override: true it will be the name of the GCP machine. b) If we run the Collector as Pod with hostNetwork: true then the value is the name of the k8s Node (==GCP node) + the dnsdomainname. c) If we run the Collector as Pod with hostNetwork: false then the value is the name of the Pod.

host name

This can be very confusing for the users, specially when it comes to multi-tenant infrastructures with multiple teams running multiple Collector's instances per org/team/namespace.

So to my mind we should be very specific here and either: 1) introduce the cloud.* specific values to ensure we don't mix things and have our users end up with misleading outcomes. This is mostly based on the idea @jsuereth proposed at https://github.com/open-telemetry/semantic-conventions/pull/600#discussion_r1506377547 if I'm not mistaken. 2) or re-use the host.* with very strict guidance on what that means for the implementations on cloud envs.

Let me know what you folks think or if I miss something here.

open-telemetry / semantic-conventions

Add missing ECS cloud fields to Semantic Conventions Cloud Resource attributes #761

What

Why

List of fields proposed for addition