openshift / library-go

Helpers for going from apis and clients to useful runtime constructs
Apache License 2.0
94 stars 226 forks source link

API-1835: migrate the node controller to SSA #1821

Closed p0lyn0mial closed 1 week ago

p0lyn0mial commented 2 weeks ago

tested the pr manually with a custom scaling test.

The original NodeStatuses were:

  nodeStatuses:
  - currentRevision: 6
    lastFailedCount: 0
    lastFailedReason: ""
    lastFailedRevision: 0
    lastFallbackCount: 0
    nodeName: ip-10-0-1-65.ec2.internal
    targetRevision: 8
  - currentRevision: 6
    lastFailedCount: 0
    lastFailedReason: ""
    lastFailedRevision: 0
    lastFallbackCount: 0
    nodeName: ip-10-0-38-169.ec2.internal
    targetRevision: 0
  - currentRevision: 6
    lastFailedCount: 0
    lastFailedReason: ""
    lastFailedRevision: 0
    lastFallbackCount: 0
    nodeName: ip-10-0-87-84.ec2.internal
    targetRevision: 0

After adding a new node:

nodeStatuses:
  - currentRevision: 8
    lastFailedCount: 0
    lastFailedReason: ""
    lastFailedRevision: 0
    lastFallbackCount: 0
    nodeName: ip-10-0-1-65.ec2.internal
    targetRevision: 0
  - currentRevision: 0
    lastFailedCount: 0
    lastFailedReason: ""
    lastFailedRevision: 0
    lastFallbackCount: 0
    nodeName: ip-10-0-27-177.ec2.internal
    targetRevision: 8
  - currentRevision: 8
    lastFailedCount: 0
    lastFailedReason: ""
    lastFailedRevision: 0
    lastFallbackCount: 0
    nodeName: ip-10-0-38-169.ec2.internal
    targetRevision: 0
  - currentRevision: 8
    lastFailedCount: 0
    lastFailedReason: ""
    lastFailedRevision: 0
    lastFallbackCount: 0
    nodeName: ip-10-0-87-84.ec2.internal
    targetRevision: 0

After removing the old node:

 nodeStatuses:
  - currentRevision: 9
    lastFailedCount: 0
    lastFailedReason: ""
    lastFailedRevision: 0
    lastFallbackCount: 0
    nodeName: ip-10-0-27-177.ec2.internal
    targetRevision: 0
  - currentRevision: 9
    lastFailedCount: 0
    lastFailedReason: ""
    lastFailedRevision: 0
    lastFallbackCount: 0
    nodeName: ip-10-0-38-169.ec2.internal
    targetRevision: 0
  - currentRevision: 9
    lastFailedCount: 0
    lastFailedReason: ""
    lastFailedRevision: 0
    lastFallbackCount: 0
    nodeName: ip-10-0-87-84.ec2.internal
    targetRevision: 0
oc get node
NAME                          STATUS   ROLES                  AGE    VERSION
ip-10-0-27-177.ec2.internal   Ready    control-plane,master   27m    v1.31.1
ip-10-0-3-155.ec2.internal    Ready    worker                 135m   v1.31.1
ip-10-0-38-169.ec2.internal   Ready    control-plane,master   143m   v1.31.1
ip-10-0-63-96.ec2.internal    Ready    worker                 136m   v1.31.1
ip-10-0-74-241.ec2.internal   Ready    worker                 136m   v1.31.1
ip-10-0-87-84.ec2.internal    Ready    control-plane,master   143m   v1.31.1

proof pr at https://github.com/openshift/cluster-kube-apiserver-operator/pull/1755

p0lyn0mial commented 1 week ago

/assign @bertinatto @deads2k

openshift-ci-robot commented 1 week ago

@p0lyn0mial: This pull request references API-1835 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.18.0" version, but no target version was set.

In response to [this](https://github.com/openshift/library-go/pull/1821): >tested the pr manually [with a custom scaling test](https://github.com/openshift/cluster-kube-apiserver-operator/pull/1755/commits/7db519604936af57c07b9256c769277d3fe17025). > >The original `NodeStatuses` were: >``` > nodeStatuses: > - currentRevision: 6 > lastFailedCount: 0 > lastFailedReason: "" > lastFailedRevision: 0 > lastFallbackCount: 0 > nodeName: ip-10-0-1-65.ec2.internal > targetRevision: 8 > - currentRevision: 6 > lastFailedCount: 0 > lastFailedReason: "" > lastFailedRevision: 0 > lastFallbackCount: 0 > nodeName: ip-10-0-38-169.ec2.internal > targetRevision: 0 > - currentRevision: 6 > lastFailedCount: 0 > lastFailedReason: "" > lastFailedRevision: 0 > lastFallbackCount: 0 > nodeName: ip-10-0-87-84.ec2.internal > targetRevision: 0 >``` > >After adding a new node: >``` >nodeStatuses: > - currentRevision: 8 > lastFailedCount: 0 > lastFailedReason: "" > lastFailedRevision: 0 > lastFallbackCount: 0 > nodeName: ip-10-0-1-65.ec2.internal > targetRevision: 0 > - currentRevision: 0 > lastFailedCount: 0 > lastFailedReason: "" > lastFailedRevision: 0 > lastFallbackCount: 0 > nodeName: ip-10-0-27-177.ec2.internal > targetRevision: 8 > - currentRevision: 8 > lastFailedCount: 0 > lastFailedReason: "" > lastFailedRevision: 0 > lastFallbackCount: 0 > nodeName: ip-10-0-38-169.ec2.internal > targetRevision: 0 > - currentRevision: 8 > lastFailedCount: 0 > lastFailedReason: "" > lastFailedRevision: 0 > lastFallbackCount: 0 > nodeName: ip-10-0-87-84.ec2.internal > targetRevision: 0 >``` > >After removing the old node: >``` > nodeStatuses: > - currentRevision: 9 > lastFailedCount: 0 > lastFailedReason: "" > lastFailedRevision: 0 > lastFallbackCount: 0 > nodeName: ip-10-0-27-177.ec2.internal > targetRevision: 0 > - currentRevision: 9 > lastFailedCount: 0 > lastFailedReason: "" > lastFailedRevision: 0 > lastFallbackCount: 0 > nodeName: ip-10-0-38-169.ec2.internal > targetRevision: 0 > - currentRevision: 9 > lastFailedCount: 0 > lastFailedReason: "" > lastFailedRevision: 0 > lastFallbackCount: 0 > nodeName: ip-10-0-87-84.ec2.internal > targetRevision: 0 >```` > >``` >oc get node >NAME STATUS ROLES AGE VERSION >ip-10-0-27-177.ec2.internal Ready control-plane,master 27m v1.31.1 >ip-10-0-3-155.ec2.internal Ready worker 135m v1.31.1 >ip-10-0-38-169.ec2.internal Ready control-plane,master 143m v1.31.1 >ip-10-0-63-96.ec2.internal Ready worker 136m v1.31.1 >ip-10-0-74-241.ec2.internal Ready worker 136m v1.31.1 >ip-10-0-87-84.ec2.internal Ready control-plane,master 143m v1.31.1 >``` > >proof pr at https://github.com/openshift/cluster-kube-apiserver-operator/pull/1755 Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Flibrary-go). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
bertinatto commented 1 week ago

/lgtm

openshift-ci[bot] commented 1 week ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bertinatto, p0lyn0mial

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/openshift/library-go/blob/master/OWNERS)~~ [bertinatto,p0lyn0mial] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
openshift-ci[bot] commented 1 week ago

@p0lyn0mial: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository. I understand the commands that are listed [here](https://go.k8s.io/bot-commands).