kubernetes-sigs / karpenter

Karpenter is a Kubernetes Node Autoscaler built for flexibility, performance, and simplicity.
Apache License 2.0
534 stars 174 forks source link

Add metrics for orphan nodes #694

Closed mohankumarmani closed 2 months ago

mohankumarmani commented 1 year ago

Version Karpenter Version: v0.27.0

Kubernetes Version: v1.23

we do see some nodes created by karpenter stay orphan and logs with controller.inflightchecks Inflight check failed for node, Expected resource "memory" didn't register on the node controller.inflightchecks Inflight check failed for node, Expected resource "ephemeral-storage" didn't register on the node do we have any metric to check on how often we get or node details ? unless we check on cluster , we don't get to know any details

though it can fixed in future versions but to know if any nodes not avail for any reason which are planned to provision by karpenter, it should have a metric to know the status

jonathan-innis commented 1 year ago

@mohankumarmani Have you tried upgrading to v0.28.1 and seeing if this solves the orphaned node issue entirely. Karpenter now has a built-in timeout to ensure that nodes register to the cluster within a static 15m timeout after launch. If this isn't fulfilled, Karpenter will auto-terminate the Machine and attempt to launch another one.

For nodes that go NotReady after registering to the cluster, we have a separate flow that is proposed for that which is captured in aws/karpenter-core#750

k8s-triage-robot commented 7 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

jonathan-innis commented 7 months ago

/remove-lifecycle stale

k8s-triage-robot commented 4 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 3 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 2 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 2 months ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes-sigs/karpenter/issues/694#issuecomment-2198492902): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.