openshift / hypershift

Hyperscale OpenShift - clusters with hosted control planes
https://hypershift-docs.netlify.app
Apache License 2.0
414 stars 308 forks source link

OCPBUGS-38258: Set KCM node monitor grace period #4404

Closed rtheis closed 1 month ago

rtheis commented 1 month ago

What this PR does / why we need it:

The Kubernetes controller manager default for node monitor grace period is not sufficient to avoid node readiness flaps during brief connectivity problems.

Which issue(s) this PR fixes (optional, use fixes #<issue_number>(, fixes #<issue_number>, ...) format, where issue_number might be a GitHub issue, or a Jira story: Fixes # https://issues.redhat.com/browse/HOSTEDCP-1776

Checklist

openshift-ci-robot commented 1 month ago

@rtheis: This pull request references HOSTEDCP-1776 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.17.0" version, but no target version was set.

In response to [this](https://github.com/openshift/hypershift/pull/4404): >**What this PR does / why we need it**: > >The Kubernetes controller manager default for node monitor grace period is not sufficient to avoid node readiness flaps during brief connectivity problems. > >**Which issue(s) this PR fixes** *(optional, use `fixes #(, fixes #, ...)` format, where issue_number might be a GitHub issue, or a Jira story*: >Fixes # https://issues.redhat.com/browse/HOSTEDCP-1776 > >**Checklist** >- [x] Subject and description added to both, commit and PR. >- [x] Relevant issues have been referenced. >- [ ] This change includes docs. >- [ ] This change includes unit tests. Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fhypershift). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
rtheis commented 1 month ago

/hold for function testing

rtheis commented 1 month ago

/retest

openshift-ci-robot commented 1 month ago

@rtheis: This pull request references HOSTEDCP-1776 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.17.0" version, but no target version was set.

In response to [this](https://github.com/openshift/hypershift/pull/4404): >**What this PR does / why we need it**: > >The Kubernetes controller manager default for node monitor grace period is not sufficient to avoid node readiness flaps during brief connectivity problems. > >**Which issue(s) this PR fixes** *(optional, use `fixes #(, fixes #, ...)` format, where issue_number might be a GitHub issue, or a Jira story*: >Fixes # https://issues.redhat.com/browse/HOSTEDCP-1776 > >**Checklist** >- [x] Subject and description added to both, commit and PR. >- [x] Relevant issues have been referenced. >- [ ] This change includes docs. >- [ ] This change includes unit tests. Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fhypershift). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
rtheis commented 1 month ago

/test e2e-azure

rtheis commented 1 month ago

/remove-hold

muraee commented 1 month ago

/lgtm

rtheis commented 1 month ago

@csrwng ptal

csrwng commented 1 month ago

/approve

openshift-ci[bot] commented 1 month ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: csrwng, rtheis

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/openshift/hypershift/blob/main/OWNERS)~~ [csrwng] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
openshift-ci-robot commented 1 month ago

/retest-required

Remaining retests: 0 against base HEAD a11f0a1b12da74f345953f9441221e4418c314f4 and 2 for PR HEAD 7a8312c9f3c5e9571ab5040ad841cb73b2175fea in total

openshift-ci[bot] commented 1 month ago

@rtheis: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository. I understand the commands that are listed [here](https://go.k8s.io/bot-commands).
openshift-bot commented 1 month ago

[ART PR BUILD NOTIFIER]

Distgit: hypershift This PR has been included in build ose-hypershift-container-v4.18.0-202408080513.p0.g16fd540.assembly.stream.el9. All builds following this will include this PR.

celebdor commented 4 weeks ago

/cherry-pick release-4.16 release-4.15 release-4.14

openshift-cherrypick-robot commented 4 weeks ago

@celebdor: new pull request created: #4519

In response to [this](https://github.com/openshift/hypershift/pull/4404#issuecomment-2278235979): >/cherry-pick release-4.16 release-4.15 release-4.14 Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
celebdor commented 4 weeks ago

/retitle OCPBUGS-38258: Set KCM node monitor grace period

openshift-ci-robot commented 4 weeks ago

@rtheis: Jira Issue OCPBUGS-38258 is in an unrecognized state (ON_QA) and will not be moved to the MODIFIED state.

In response to [this](https://github.com/openshift/hypershift/pull/4404): >**What this PR does / why we need it**: > >The Kubernetes controller manager default for node monitor grace period is not sufficient to avoid node readiness flaps during brief connectivity problems. > >**Which issue(s) this PR fixes** *(optional, use `fixes #(, fixes #, ...)` format, where issue_number might be a GitHub issue, or a Jira story*: >Fixes # https://issues.redhat.com/browse/HOSTEDCP-1776 > >**Checklist** >- [x] Subject and description added to both, commit and PR. >- [x] Relevant issues have been referenced. >- [ ] This change includes docs. >- [ ] This change includes unit tests. Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fhypershift). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
celebdor commented 4 weeks ago

/cherry-pick release-4.16 release-4.15 release-4.14

openshift-cherrypick-robot commented 4 weeks ago

@celebdor: new pull request created: #4520

In response to [this](https://github.com/openshift/hypershift/pull/4404#issuecomment-2278244243): >/cherry-pick release-4.16 release-4.15 release-4.14 Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.