Open Yun-Ting opened 1 year ago
the definition of
process.cpu.utilization
is "Difference in process.cpu.time since the last measurement, divided by the elapsed time and number of CPUs available to the process", which requires maintaining the state (in this case, the last collection time) of the instrument.
I think this is a really interesting point.
It seems like process.cpu.utilization
is defined as a "delta metric" and may behave strangely in the case of multiple metric readers.
Maybe it's good to remove process.cpu.utilization
altogether, in favor of process.cpu.time
+ process.cpu.count
?
It seems like
process.cpu.utilization
is defined as a "delta metric" and may behave strangely in the case of multiple metric readers.
@alanwest made similar https://github.com/open-telemetry/opentelemetry-specification/issues/2439#issuecomment-1353938369:
In the event of multiple meter providers, their reporting intervals may be different. So, calculating the difference in process.cpu.time since the last measurement requires the instrumentation to maintain some state per meter provider. I don't think the metric API spec offers the support necessary for this.
It seems like process.cpu.utilization is defined as a "delta metric" and may behave strangely in the case of multiple metric readers.
Agree. It would be instead useful if process.cpu.utilization
represents real-time percentage cpu utilization for given process (as provided in %CPU column of top
command om posix), as an UpDownCounter. Or maybe have a separate metrics for that. For C++ applications on linux, process.cpu.count
would mostly be the total number of cores in the machine, and same value would be recorded each time. Probably it is more useful to .Net Runtime or JVM if they provides more accurately the number of processors/cores allocated to it. Nothing against this PR though :)
Maybe it's good to remove process.cpu.utilization altogether, in favor of process.cpu.time + process.cpu.count?
At a minimum, the spec needs to clarify that it is not possible for instrumentation written using OTel's language APIs to generate the process.cpu.utilization
(at least as it is currently defined).
Regarding removing it altogether, I'm not sure..
The collector's hostmetrics receiver generates the process.cpu.utilization
metric.
It is my understanding that the semantic conventions of the spec are not limited to only describing metrics produced using language APIs. If the collector produces this metric, and its semantics match what is here in the spec, then shouldn't we leave it in the spec?
It would be instead useful if process.cpu.utilization represents real-time percentage cpu utilization for given process (as provided in %CPU column of top command om posix), as an UpDownCounter.
I agree. This sounds like the semantics of the process.runtime.jvm.cpu.utilization
metric:
Recent CPU utilization for the process. [2]
[2]: These utilizations are not defined as being for the specific interval since last measurement (unlike system.cpu.utilization).
Unless there is good reason not to, I would prefer to redefine the process.cpu.utilization
to be the same as Java's process.runtime.jvm.cpu.utilization
. If this is not possible, then perhaps all languages could have a process.runtime.*.cpu.utilization
metric.
Regarding process.cpu.count
, I'm also not against it, but if it's primary purpose is to enable computing utilization, then I think it would be ideal to just produce it directly.
From the spec triage meeting: This looks like a reasonable request but the right approach still needs to be decided on, which is already being discussed in this thread.
cc @open-telemetry/instr-wg for your input on this
Hi @open-telemetry/instr-wg, may I kindly have your take on this? Thank you.
This should be transferred to https://github.com/open-telemetry/semantic-conventions
@open-telemetry/semconv-system-approvers can please take a look at this and see if needs more info or can be added to the system semconv project? thanks!
What are you trying to achieve?
Add a metric to expose number of available processors to the current process to semantic conventions for OS process metrics.
process.cpu.count
Currently, the definition of
process.cpu.utilization
is "Difference in process.cpu.time since the last measurement, divided by the elapsed time and number of CPUs available to the process", which requires maintaining the state (in this case, the last collection time) of the instrument.The challenge encountered during implementation in .NET is: https://github.com/open-telemetry/opentelemetry-dotnet-contrib/issues/831
Potential workarounds: https://github.com/open-telemetry/opentelemetry-dotnet-contrib/pull/948
What did you expect to see?
Add
process.cpu.count
metric to the semantic conventions and let the backend do the computation. Given instrument values ofprocess.cpu.time
andprocess.cpu.count
, the backend will have sufficient data to calculate the CPU utilization metric. https://github.com/open-telemetry/opentelemetry-dotnet-contrib/pull/981Additional context.
Previous discussion related to this topic: https://github.com/open-telemetry/opentelemetry-specification/pull/2392