Open ElfoLiNk opened 6 months ago
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself.
FYI I've opened a PR on semconv for last terminated reason -> https://github.com/open-telemetry/semantic-conventions/issues/922 and looks like some refactorings are needed on my PR. So this time let's first agree if we want this and then make a PR to semconv
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers
. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself.
I am seeing that the k8s.container.status.current_waiting_reason
property has been added in Semantic Conventions.
Do we need to wait for any more checks before drafting a PR ?
I am happy to contribute, if required.
FYI this was reverted in https://github.com/open-telemetry/semantic-conventions/pull/1115
see the discussion in original PR https://github.com/open-telemetry/semantic-conventions/pull/997
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers
. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself.
People keep asking me about this issue, so I think we should solve for it somehow in OTEL.
I'm thinking to propose a simple 0 / 1 state metric, to track if container is waiting for something. This is what Kube State Metrics does with kube_pod_container_status_waiting
metric.
My proposal is this:
k8s.container.status.waiting:
enabled: false
description: Wheter container is in waiting state. (0 for now, 1 for yes)
gauge:
value_type: int
@TylerHelmuth / @dmitryax thoughts?
I think we already have similiar metrics in Cluster Receiver, so it should fit our current model. Example:
k8s.container.ready:
enabled: true
description: Whether a container has passed its readiness probe (0 for no, 1 for yes)
unit: ""
gauge:
value_type: int
I actually ran into this the other week as well and would like a solution. I thought the semantic convention SIG was blocking us on entities?
Initially I wanted to add resource attribute k8s.container.status.current_waiting_reason
which has the actual reason of why Container is in waiting state. Example k8s.container.status.current_waiting_reason=CrashLoopBackOff
.
This didn't work due to Resource Attribute immutability.
This new PR actually does a different thing, I'm adding an enum metric, which checks if container is in waiting state or not. So it's a metric that tracks container state, but doesn't tell you the reason.
Given current OTEL model, the actual reason will probably go to Entities as non identifying attribute :thinking: While having waiting state metric IMO still makes sense and is useful.
Component(s)
receiver/k8scluster
Is your feature request related to a problem? Please describe.
I would like to get some container state metrics, about waiting reason. One use case is to know whether the container is in CrashLoopBackOff.
Example happening in pod:
Kube State Metrics has this modelled as this Prometheus metric:
Ref: https://github.com/kubernetes/kube-state-metrics/blob/main/docs/metrics/workload/pod-metrics.md
So would be great to have a similar metric.
Describe the solution you'd like
https://github.com/kubernetes/kube-state-metrics/blob/main/internal/store/pod.go#L554-L578
Describe alternatives you've considered
No response
Additional context
No response