kubernetes-sigs / karpenter

Karpenter is a Kubernetes Node Autoscaler built for flexibility, performance, and simplicity.
Apache License 2.0
620 stars 204 forks source link

Mega Issue: Karpenter Observability (metrics, logs, eventing, etc.) #1051

Open jonathan-innis opened 8 months ago

jonathan-innis commented 8 months ago

Description

What problem are you trying to solve?

As part of the journey to v1, I'd like us to consider the wholistic story of what we are doing with our metrics, logging, eventing, and status fields (status conditions, etc.) across the codebase. Right now, we have been adding in metrics, logging, and eventing piece-meal, but we haven't had a wholistic review over the whole story or given recommendations around how users of Karpenter should be monitoring it and what they should be alerting on (outside of our Grafana dashboard in our documentation).

This issue is meant to be a mega-issue for capturing all of the the other issues in the repo that are considering changes or improvements to the current metrics and monitoring story:

Bryce-Soghigian commented 7 months ago

Also https://github.com/kubernetes-sigs/karpenter/issues/712

tallaxes commented 6 months ago

Surfacing total node count per NodePool - likely via NodePool.status - has been requested (cf. Workgroup Meeting 2024-05-09)

jan-ludvik commented 3 months ago

It would be really great if karpenter could expose total real time cluster cost like eks-node-viewer does. Here in the cluster summary with cluster cost on the right:

44 nodes (902794m/1056270m) 85.5% cpu ██████████████████████████████████░░░░░░ $25.082/hour | $18,310.152/month 
2,072 pods (44 pending 2,028 running 2,035 bound)

I am aware of karpenter_cloudprovider_instance_type_offering_price_estimate metric but I don't know how calculate cluster cost from that in Datadog.