open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
2.97k stars 2.31k forks source link

[processor/resourcedetection, receiver/hostmetrics] Report total memory and CPU capacity numbers as resource attributes #22099

Closed mx-psi closed 1 year ago

mx-psi commented 1 year ago

Component(s)

processor/resourcedetection, receiver/hostmetrics

Describe the issue you're reporting

As part of improving the infrastructure monitoring capabilities of the OpenTelemetry Collector, I want to report total memory and filesystem capacity as well as CPU cores.

Part of this information can already be retrieved by combining information from the hostmetrics receiver; for example if you count the number of cpu values on system.cpu.time you can get the total number of cores. However, if you want to produce this information at the exporter, you then depend on all metrics reaching the same exporter and therefore you would make your deployment stateful.

I want therefore to add this information as resource attributes to avoid stateful deployments.

My remaining open question is where to add this. I see two possibilities:

  1. Make this part of the resource attributes on metrics generated by the host metrics receiver.
  2. Make this part of the resource attributes added by the resource detection processor system detector.

I am leaning towards (2), since that way this information can be leveraged by users that do not use the host metrics receiver but still want to have that kind of information, but I want some other opinions.

github-actions[bot] commented 1 year ago

Pinging code owners for receiver/hostmetrics: @dmitryax. See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions[bot] commented 1 year ago

Pinging code owners for processor/resourcedetection: @Aneurysm9 @dashpole. See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions[bot] commented 1 year ago

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

dmitryax commented 1 year ago

I am leaning towards (2), since that way this information can be leveraged by users that do not use the host metrics receiver but still want to have that kind of information, but I want some other opinions.

Do you have a use case of metrics coming from another receiver that would need that attribute?

mx-psi commented 1 year ago

I am leaning towards (2), since that way this information can be leveraged by users that do not use the host metrics receiver but still want to have that kind of information, but I want some other opinions.

Do you have a use case of metrics coming from another receiver that would need that attribute?

I guess I am thinking of something like debugging an issue by filtering a telemetry signal (say, traces) based on the total capacity on a given metric (similar to the use case for something like host.type). For most users, using host.type for this is more useful, but if you are not on a cloud environment or want to get really detailed these could also be useful

dmitryax commented 1 year ago

I believe these can be optional resource attributes on both host metrics receiver and resource detection receiver, so user can decide what scope to apply it to.

Another thing I would like to clarify is if putting that information in resource attributes is a good approach. Do we have anything in the OTel specification/semconv to guide here? If not, we probably should start there.

frzifus commented 1 year ago

My remaining open question is where to add this. I see two possibilities:

Would it also be an option to provide this info as a metric? I would be interested in something like system.cpu.number. In my case, that would be the only metric I am interested in. With physical hardware, it's probably a bit boring. But with virtual machines, the number of available CPUs can be changed at runtime.

update Thats what I had in mind: https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/23231


cc @chambridge

dmitryax commented 1 year ago

Would it also be an option to provide this info as a metric? I would be interested in something like system.cpu.number. In my case, that would be the only metric I am interested in. With physical hardware, it's probably a bit boring. But with virtual machines, the number of available CPUs can be changed at runtime.

Actually, I like this more than putting it into the resource attribute because this metric can be used in computations on the backends, while it's hard to do with the resource attribute. But I still would like to see any guidance from the OTel spec regarding this. It's be great if someone can look into that and start an issue if it's not specified anywhere.

cc @mx-psi

mx-psi commented 1 year ago

Let's continue the discussion on the semantic-conventions repository first to clear up both the name and whether this should be a metric or a resource attribute. I will mark this as 'on hold' in the meanwhile.

diranged commented 5 months ago

@dmitryax, Coming back to this - I'd like to see these values as resource attributes because we're trying to map attributes from the OTEL Collector into Datadog via https://docs.datadoghq.com/opentelemetry/schema_semantics/host_metadata/#cpu-conventions. Is that something that's possible today now that the values exist as metrics? Or do we need another patch to potentially expose these as attributes? For what it's worth, I don't really understand why they were made metrics given that they don't generally change?