open-telemetry / opentelemetry-python-contrib

OpenTelemetry instrumentation for Python modules
https://opentelemetry.io
Apache License 2.0
705 stars 589 forks source link

`process.runtime.cpu.utilization` values are between 0 and 100, should be 0 and 1 #2810

Closed alexmojaki closed 1 month ago

alexmojaki commented 1 month ago

What happened?

The process.runtime.cpu.utilization system metric should have values between 0 and 1, based on the spec:

utilization - an instrument that measures the fraction of usage out of its limit should be called entity.utilization. For example, system.memory.utilization for the fraction of memory in use. Utilization can be with respect to a fixed limit or a soft limit. Utilization values are represented as a ratio and are typically in the range [0, 1], but may go above 1 in case of exceeding a soft limit.

(https://opentelemetry.io/docs/specs/semconv/general/metrics/#instrument-naming)

Instead the values are between 0 and 100. The values are in the correct range for system.cpu.utilization.

Steps to Reproduce

from opentelemetry.instrumentation.system_metrics import SystemMetricsInstrumentor
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import InMemoryMetricReader

reader = InMemoryMetricReader()
meter_provider = MeterProvider(metric_readers=[reader])
config = {
    'process.runtime.cpu.utilization': None,
    'system.cpu.utilization': ['user'],
}
instrumentor = SystemMetricsInstrumentor(config=config)
instrumentor.instrument(meter_provider=meter_provider)

# Take an initial reading which will always record 0 for the process metric,
# see https://github.com/open-telemetry/opentelemetry-python-contrib/issues/2797#issuecomment-2298749008
reader.collect()

# Use some CPU for some time to get a high reading.
sum(list(range(int(1e8))))

metrics_data = reader.get_metrics_data()
for resource_metric in metrics_data.resource_metrics:
    for scope_metric in resource_metric.scope_metrics:
        for metric in scope_metric.metrics:
            print(metric.name)
            print(sorted([round(data_point.value, 2) for data_point in metric.data.data_points]))

Actual Result

system.cpu.utilization
[0.0, 0.0, 0.0, 0.0, 0.0, 0.01, 0.02, 0.03, 0.04, 0.05, 0.07, 0.1, 0.11, 0.13, 0.16, 0.26]
process.runtime.cpython.cpu.utilization
[100.0]

Expected Result

Under process.runtime.cpython.cpu.utilization it should print something like [1.0], not [100.0].

Additional context

This code:

https://github.com/open-telemetry/opentelemetry-python-contrib/blob/dda369b7247919b8d4351b6a2535c7ad9e7f0fc0/instrumentation/opentelemetry-instrumentation-system-metrics/src/opentelemetry/instrumentation/system_metrics/__init__.py#L726-L734

should have a / 100, similar to this:

https://github.com/open-telemetry/opentelemetry-python-contrib/blob/dda369b7247919b8d4351b6a2535c7ad9e7f0fc0/instrumentation/opentelemetry-instrumentation-system-metrics/src/opentelemetry/instrumentation/system_metrics/__init__.py#L433-L448

This is because psutil returns values in the 0-100 range.

Would you like to implement a fix?

No

mrugeshmaster commented 1 month ago

I would like to work on this issue

rissh commented 1 month ago

Hey @lzchen ,

I also want to give it a try to work on this issue. Can you please assign me this issue? Please don't hesitate to give me any suggestions or guidance as I work on this task. I'm open to feedback and would appreciate any insights you have.

Thank you!

lzchen commented 1 month ago

@rissh

I believe @mrugeshmaster commented first to want to work on this issue. You can reach out to them if you want to collaborate or wait for a PR from them so you can review.