Open rifelpet opened 9 months ago
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
/kind flake
The Karpenter prow jobs are flaky because of Karpenter's aggressive node churn.
This run failed while waiting for the cluster to pass validation. Nodes were terminated during validation according to the karpenter logs:
This run passed validation but had flakey e2e tests that timed out while waiting for pods to be scheduled and running. The karpenter logs reveal multiple terminations and launches during the e2e suite:
Example test failure because of pods pending:
The k/k e2e suite also inspects the state of the cluster at the beginning to determine # of nodes, zone spread, etc.
We should aim for node stability during e2e tests. We currently enable karpenter's consolidation but should probably disable it for e2e:
https://github.com/kubernetes/kops/blob/03779878069ec0362e33fe308d3dd789e8d516d5/upup/models/cloudup/resources/addons/karpenter.sh/k8s-1.19.yaml.template#L1815-L1817
Note that newer karpenter versions introduced a v1beta1 API with significant changes so any new InstanceGroup API fields should probably reflect the v1beta1 API's terminology and/or schema