open-telemetry / semantic-conventions

Defines standards for generating consistent, accessible telemetry across a variety of domains
Apache License 2.0
271 stars 174 forks source link

Add resource attribute for number of cores #977

Open rockdaboot opened 6 months ago

rockdaboot commented 6 months ago

Area(s)

area:host, area:system

Is your change request related to a problem? Please describe.

The profiling agent that recently has been donated to OTEL by Elastic sends the number of present logical CPU cores as host metadata.

Currently, the field name is specific to Elastic and should be changed it to something fitting for OTEL semantic conventions.

Describe the solution you'd like

Add a new resource attribute like system.cpu.logical.count (or under host.cpu.*).

A more generic approach would be to reflect the CPU topology, similar to what Linux systems present (see https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-devices-system-cpu) or at least keep that in mind when deciding for a name.

Describe alternatives you've considered

Use a field name outside the OTEL semantic conventions (what is done currently). Not ideal regarding future compatibility (may theoretically cause conflicts).

Additional context

No response

mx-psi commented 3 months ago

We had a long discussion on #99 about whether this should be an attribute or a metric and we decided it would be a metric. You can see https://github.com/open-telemetry/semantic-conventions/pull/99#discussion_r1222742111. I don't think we should represent the same concept as both a metric and a resource attribute.

rockdaboot commented 3 months ago

@mx-psi Thank you for the pointer. I just left a comment (https://github.com/open-telemetry/semantic-conventions/pull/99#discussion_r1683104887). My use case would be profiling, where we use the number of logical CPUs as part of the CO2 and $ cost calculations. Maybe we can leave this issue open for more opinions.

trask commented 3 months ago

hey @rockdaboot, I'm not sure if cpu cores can be a resource attribute today since it is potentially mutable (https://github.com/open-telemetry/opentelemetry-specification/pull/2384#issue-1150952467)

but I think this limitation is being worked on by the resources and entities SIG, see https://github.com/open-telemetry/community/blob/main/projects/resources-and-entities.md#problem-3-lack-of-mutable-attributes

rockdaboot commented 3 months ago

Hey @trask, thanks for the links, just my point of view...

I'm not sure if cpu cores can be a resource attribute today since it is potentially mutable (open-telemetry/opentelemetry-specification#2384 (comment))

Well, isn't everything above the hardware level mutable during runtime!? For me, the question is where to draw the line, e.g. why are host.name, host.ip and similar attributes considered unmutable but the number of cores is considered mutable?

We are talking about the host/OS level here, not about single processes. Is seems not to be relevant whether the Java runtime is able to pick up changed CPU values or whether taskset or docker are able to limit CPU resources for single processes (as comments in #2384 suggest).

Let's see how the resources and entities SIG decides :)