Open TylerHelmuth opened 11 months ago
/cc @jinja2 @dmitryax @povilasv
Did some digging:
Kubernetes Docs state:
// Total CPU usage (sum of all cores) averaged over the sample window. // The "core" unit can be interpreted as CPU core-nanoseconds per second. // +optional UsageNanoCores *uint64
json:"usageNanoCores,omitempty"
Looks like it's getting these metrics from CRI and if CRI doesn't have stats it's computing using this formula:
nanoSeconds := newStats.Timestamp - cachedStats.Timestamp
usageNanoCores := uint64(float64(newStats.UsageCoreNanoSeconds.Value-cachedStats.UsageCoreNanoSeconds.Value) /
float64(nanoSeconds) * float64(time.Second/time.Nanosecond))
Where:
// Cumulative CPU usage (sum across all cores) since object creation. UsageCoreNanoSeconds *UInt64Value
protobuf:"bytes,2,opt,name=usage_core_nano_seconds,json=usageCoreNanoSeconds,proto3" json:"usage_core_nano_seconds,omitempty"
:thinking:
Playing a bit with the formula:
Limit - is total available cpu time.
Let's say we collect every 1 second, and app uses total available cpu time so 1 second.
nanoSeconds := now() - (now() - 1s) = 1s = 1,000,000,000 nanoseconds
UsaeNanocores := (2,000,000,000 - 1,000,000,000) / 1,000,000,000 * 1,000,000,000 = 1,000,000,000
or simplified:
UsageNanocores := (2s - 1s) / 1s * float64(time.Second/time.Nanosecond)) = unit64(1 * float64(time.Second/time.Nanosecond))) = 1,000,000,000
Based on this example, the result is actual usage of 1,000,000,000 nano seconds or 1second.
So this metricunit seems to be nanoseconds, not percentage.
If my calculations are correct, I think we should rename to cpu.usage
with proper unit (nanoseconds)?
@povilasv thank you!
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers
. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself.
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers
. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself.
FYI @TylerHelmuth @povilasv In SemConv we have merged https://github.com/open-telemetry/semantic-conventions/pull/282 which adds the container.cpu.time
metric for now.
For the dockerstats
receiver we have https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/31649 which will try to align the implementation with the added SemConvs.
Do we have a summary so far for what is missing from the kubeletstats
receiver in terms of naming changes (like this current issue)?
Shall we try to adopt the kubeletstats
receiver with https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/31649? Happy to help with that.
At the moment the implementation of the receiver provides the following:
Are we planning to keep them all? Are those all allgined with https://github.com/open-telemetry/semantic-conventions/blob/71c2e8072596fb9a4ceb68303c83f5389e0beb5f/docs/general/metrics.md#instrument-naming?
From https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/25901/files#diff-3343de7bfda986546ce7cb166e641ae88c0b0aecadd016cb253cd5a0463ff464R352-R353 I see we are going to remove/deprecate container.cpu.utilization
?
Could we keep it instead as optional metric and find a proper way to calculate it? I see it was mentioned at https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/25901#issuecomment-1776708553 but not sure how it was resolved. I guess that would be possible by adding a cpuNodeLimit
(retrieved from the Node Resource) at https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/80bbf5e540bbcddf69a5d065e4962471ce572e60/receiver/kubeletstatsreceiver/internal/kubelet/metadata.go#L71 similarly with set resource limit. I drafted a very WiP implementation for this to illustrate the point -> https://github.com/ChrsMark/opentelemetry-collector-contrib/commit/27ce7691af67933097ea0defe2bf0f646d19a6ea
@ChrsMark in my opinion yes to all questions. We want to be aligned with the spec (although I'd love to reduce the number of iterations in the receivers to gain that alignment, how long till we're stable lol).
I don't have a lot of time to dedicate to getting kubeletstatsreceiver update-to-date with the non-stable spec. At this point I was planning to wait for things to stabilize before making any more changes besides the work we started in this issue.
Thank's @TylerHelmuth, I see the point of not chasing after an unstable schema/spec.
Just to clarify regarding the container.cpu.utilization
though: shall we abort its deprecation and try to calculate this properly as it was mentioned at https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/25901#issuecomment-1776708553? This can happen based on the cpuNodeLimit
, and seems to be doable based on a quick research I did: https://github.com/ChrsMark/opentelemetry-collector-contrib/commit/27ce7691af67933097ea0defe2bf0f646d19a6ea
@ChrsMark yes I'd be fine with keeping the metric if we can calculate it correctly. We'd still need to go through some sort of feature-gate processor to make it clear to users that the metric has changed and that if they want the old value they need to use the new .usage
metric.
@TylerHelmuth @povilasv I have drafted a patch to illustrate the point at https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/32295. My sightings look promising :).
If we agree on the idea I can move the PR forward to fix the details and open it for review. Let me know what you think.
Seems reasonable. @jinja2 please take a look
This looks reasonable for me as well. I would add the informer too so we don't call get Node everytime we scrape data, in practice Node cpu capacity doesn't change
Thank's for the feedback folks!
I have adjusted https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/32295 to use an informer instead of getting the Node on every scrape. Something I noticed is that we have several places where there are different informer based implementations, like in k8sclusterreceiver
and k8sattributesprocessor
. Maybe it would make sense to extract that common logic into a common lib and re-use it from the various receivers and processors (even in resourcedetectorprocessor
), but we can file a different issue for this.
For now we can continue any discussion at https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/32295.
Hey folks. Sorry, I'm late to the party.
I am not convinced about repurposing the container.cpu.utilization
to be a ratio of the node cpu limit:
k8s.container.cpu_limit_utilization
gives pretty good idea. I can imagine it to be utilization of some container limit, not node's limit. container.cpu.utilization
name is generic and can be used in docker receiver as well. Can we always easily retrieve number of host CPUs there? I'm not sure about it.I think something like k8s.container.node_limit_utilization
would be a better name here. It's clear and consistent with other utilization
metrics.
Even if we decide to repurpose container.cpu.utilization
, I would strongly suggest deprecating->disabling and maybe removing it first instead of just changing its meaning. I believe the metrics based on the node limit should be optional, and container.cpu.time
or container.cpu.usage
should be enabled by default instead.
cc @ChrsMark @TylerHelmuth @povilasv
Thank's @dmitryax. I agree with making the metric more specific here.
There is a similar distinction in meatricbeat as well, with *.node.pct
and *.limit.pct
accordingly.
If others agree, I can change the PR accordingly to:
k8s.container.cpu.node_limit_utilization
.container.cpu.utilization
as deprecated.k8s.pod.cpu.node_limit_utilization
as part of the same PR.k8s.node.cpu.utilization
since we can use it as is I guess we will need to go through the feature flag path in a separate PR.Extra: I wonder also if that would make sense to change k8s.container.cpu_limit_utilization
to k8s.container.cpu.limit_utilization
so as to be consistent with having k8s.container.cpu.*
as a clear namespace.
@dmitryax @TylerHelmuth @povilasv let me know what you think.
I also agree with @dmitryax points too. I guess computing against node limits has problems and is unfair if node limits > container limits, etc. So we need to do proper deprecation.
Regarding k8s.container.cpu.node_limit_utilization
, is this useful? if you run many pods on a Node and they start consuming cpu and linux gives all of them equal share, your Pod wont report 100 % k8s.container.cpu.node_limit_utilization
and yet it wont be able to get more cpu.
Regarding k8s.container.cpu.node_limit_utilization, is this useful
I still find this useful in order to be able to compare how much cpu utilization a container/pod has against the Node's capacity. This helps you understand if a Pod/container is a really problematic workload or not. Not against its own limit but against the Node's capacity.
For example, you can see 96% limit_utilization
from container A and 96% limit_utilization
from container B. But if those 2 have different resource limits you don't get a realistic overview of which one of them consumes more against the Node.
What does "node_limit" mean here? Is it referring to the host's capacity or the node allocatable (capacity - system/kube reserved)? I find node_limit to be ambiguous but my assumption would be that it refers to allocatable since that's the amount of resources actually available for pod scheduling. Do you think a user might want to select whether the utilization is against the capacity or the allocatable?
Even if we decide to repurpose container.cpu.utilization, I would strongly suggest deprecating->disabling and maybe removing it first instead of just changing its meaning. I believe the metrics based on the node limit should be optional, and container.cpu.time or container.cpu.usage should be enabled by default instead.
I am good with this. I care mainly about the switch from *.cpu.utilization
to *.cpu.usage
.
Do you think a user might want to select whether the utilization is against the capacity or the allocatable?
I would go with the capacity
since the allocatable
is kind of k8s specific and is another "limit" to my mind. I think users would be fine to see the actual utilization against the actual node's capacity. This would not require extra knowledge like the allocatable configuration etc.
I will change the PR for now to to use the k8s.container.cpu.node_limit_utilization
and we can continue the discussion there.
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers
. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself.
Any thoughts about the k8s.node.cpu.utilization? Should this be calculated properly instead of deprecated? This one should be straightforward compared to the container's/pod's ones since the node does not have the respective request/limit based metrics.
So it would just be
k8s.node.cpu.utilization = k8s.node.cpu.usage / num_of_cores
Sounds good to me, if we can compute it properly
I wonder though if we should first deprecate the metrics and re-introduce this specific one back as it was pointed out at https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/27885#issuecomment-2089636637.
Otherwise we can fix this one directly.
I personally, would be okay if we just fix it, but not sure what others think?
The thing is that change the way it is calculated requires enabling the NodeInformer
. So if we change this now as is, any existing users' configurations will become invalid and the metric will being reported as 0
. This is highlighted by the failing tests at https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/34191.
So what I would suggest here is:
*.cpu.usage
metrics enabled by default: https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/34217*.cpu.utilization
metrics disabled by default.k8s.node.cpu.utilization
, k8s.pod.cpu.utilization
and container.cpu.utilization
metrics.k8s.node.cpu.utilization
with https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/34191, which will be disabled by default.The above should be done in concrete/standalone PRs and can be split across different releases to ensure a gradual switch process for the users.
WDYT?
/cc @TylerHelmuth @dmitryax
So if we change this now as is, any existing users' configurations will become invalid and the metric will being reported (will be 0).
Ya that'd be bad.
I like this plan as it works toward the original goal (using the proper name) and allows us to work on an actual utilization metric separately. My only concern is that I really didn't want to do breaking semantic convention changes to the k8s components until the k8s semconv was stable, so that all the changes could be hidden behind one feature flag.
That isn't happening anytime soon tho. I think starting with enabling cpu.usage
by default is a good next step.
@ChrsMark, I like that outlined plan. Thanks for putting it together! However, I'd prefer combining 1 and 2 in one release so we don't change the cardinality. cc @TylerHelmuth
@dmitryax If we do 1 and 2 in one release we need more evangelism around the change. The usage
metrics and their warnings have been around for a long time, but the change wasn't very public and I worry users aren't adopting it.
I think we'd need to update to the README, add a warning in the release notes, and update the printed warning that the value will "swap" in a specific future release. Maybe write a blog post too.
I think we'd need to update to the README, add a warning in the release notes, and update the printed warning that the value will "swap" in a specific future release. Maybe write a blog post too.
@TylerHelmuth @dmitryax Should we then first do those in a step 0 in order to later combine step 1 and step 2? I'd be fine with that and taking care of it.
Sure
@TylerHelmuth cool! Any preferences on what the target release of this change/deprecation should be? Based on the release-schedule I would go with v0.111.0
(2024-10-07
).
Is there a way for use to add a feature gate or something that causes the collector to fail on start if utilization
is used? Breaking changes from semantics are so difficult because a user could upgrade the collector and not realize it just started emitting new telemetry and in this case it is a metric that has been a core metric from this receiver for 4+ years.
It would be great it we can add a set where the collector fails to start if utilization is in use, and the user could disable the feature gate to get back to a running state, but at least they'd have to acknowledge the change is coming on a specific date.
It would be great it we can add a set where the collector fails to start if utilization is in use, and the user could disable the feature gate to get back to a running state, but at least they'd have to acknowledge the change is coming on a specific date.
Hmm having the Collector to fail completely for one metric sounds a bit concerning to me tbh. Has this approach been used in other components too?
How about making the switch completely behind a feature gate then?
This way we enable users to switch whenever they want prior to a specific release and then after we have make the switch the feature gate to continue using the utilization
will still be there for some period. We can define a plan like the following:
alpha
- disabled by default.beta
in v0.y.0 - enabled by default. (for configs that still enable the utilization
it can fail)stable
in v0.z.0 - cannot be disabled.stable
.(we have something similar in hostmetricsreceiver
)
Wouldn't that cover most of what we have discussed here?
Yes thats what I'm thinking. If the feature gate is enabled and a utilization
metric is enabled, the collector will fail to start.
Thank's @TylerHelmuth! I will be off for the following weeks, but I will pick it up once I'm back.
@TylerHelmuth @dmitryax I have tried the feature gate approach at https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/35139. Please have a look once you get the chance. You can find more details about the feature gate's plan in the PR's description.
Component(s)
receiver/kubeletstats
Is your feature request related to a problem? Please describe.
The Kubeletestats Receiver currently uses
*.cpu.utilization
as the name for cpu metrics that report the CPUStatsUsageNanoCores
value.I believe that
UsageNanoCores
reports the actual amount of cpu being used not the ratio of the amount being used out of a total limit. If this is true, then our use ofutilization
is not meeting semantic convention exceptions.I would like to have a discussion about what exactly
UsageNanoCores
represents and if our metric naming needs updating.Related to discussion that started in https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/24905