Closed carbonin closed 4 weeks ago
@carbonin: This pull request references MGMT-10006 which is a valid jira issue.
Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.17.0" version, but no target version was set.
/test ?
@carbonin: The following commands are available to trigger required jobs:
/test e2e-agent-compact-ipv4
/test edge-assisted-operator-catalog-publish-verify
/test edge-ci-index
/test edge-e2e-ai-operator-ztp
/test edge-e2e-ai-operator-ztp-sno-day2-workers
/test edge-e2e-ai-operator-ztp-sno-day2-workers-late-binding
/test edge-e2e-metal-assisted
/test edge-e2e-metal-assisted-4-12
/test edge-e2e-metal-assisted-cnv-4-16
/test edge-e2e-metal-assisted-lvm
/test edge-e2e-metal-assisted-odf-4-16
/test edge-images
/test edge-lint
/test edge-subsystem-aws
/test edge-subsystem-kubeapi-aws
/test edge-unit-test
/test edge-verify-generated-code
/test images
/test mce-images
The following commands are available to trigger optional jobs:
/test e2e-agent-ha-dualstack
/test e2e-agent-sno-ipv6
/test edge-e2e-ai-operator-disconnected-capi
/test edge-e2e-ai-operator-ztp-3masters
/test edge-e2e-ai-operator-ztp-capi
/test edge-e2e-ai-operator-ztp-compact-day2-masters
/test edge-e2e-ai-operator-ztp-compact-day2-workers
/test edge-e2e-ai-operator-ztp-disconnected
/test edge-e2e-ai-operator-ztp-hypershift-zero-nodes
/test edge-e2e-ai-operator-ztp-multiarch-3masters-ocp
/test edge-e2e-ai-operator-ztp-multiarch-sno-ocp
/test edge-e2e-ai-operator-ztp-node-labels
/test edge-e2e-ai-operator-ztp-sno-day2-masters
/test edge-e2e-ai-operator-ztp-sno-day2-workers-ignitionoverride
/test edge-e2e-metal-assisted-4-13
/test edge-e2e-metal-assisted-4-14
/test edge-e2e-metal-assisted-4-15
/test edge-e2e-metal-assisted-4-16
/test edge-e2e-metal-assisted-bond
/test edge-e2e-metal-assisted-bond-4-14
/test edge-e2e-metal-assisted-day2
/test edge-e2e-metal-assisted-day2-arm-workers
/test edge-e2e-metal-assisted-day2-single-node
/test edge-e2e-metal-assisted-external
/test edge-e2e-metal-assisted-external-4-14
/test edge-e2e-metal-assisted-ipv4v6
/test edge-e2e-metal-assisted-ipv6
/test edge-e2e-metal-assisted-kube-api-late-binding-single-node
/test edge-e2e-metal-assisted-kube-api-late-unbinding-ipv4-single-node
/test edge-e2e-metal-assisted-kube-api-net-suite
/test edge-e2e-metal-assisted-mce-4-16
/test edge-e2e-metal-assisted-mce-sno-4-16
/test edge-e2e-metal-assisted-metallb
/test edge-e2e-metal-assisted-none
/test edge-e2e-metal-assisted-onprem
/test edge-e2e-metal-assisted-single-node
/test edge-e2e-metal-assisted-static-ip-suite
/test edge-e2e-metal-assisted-static-ip-suite-4-14
/test edge-e2e-metal-assisted-tang
/test edge-e2e-metal-assisted-tpmv2
/test edge-e2e-metal-assisted-upgrade-agent
/test edge-e2e-nutanix-assisted
/test edge-e2e-nutanix-assisted-2workers
/test edge-e2e-nutanix-assisted-4-14
/test edge-e2e-oci-assisted
/test edge-e2e-oci-assisted-4-14
/test edge-e2e-oci-assisted-iscsi
/test edge-e2e-vsphere-assisted
/test edge-e2e-vsphere-assisted-4-14
/test edge-e2e-vsphere-assisted-4-15
/test edge-e2e-vsphere-assisted-4-16
/test edge-e2e-vsphere-assisted-umn
/test okd-scos-images
/test push-pr-image
Use /test all
to run the following jobs that were automatically triggered:
pull-ci-openshift-assisted-service-master-e2e-agent-compact-ipv4
pull-ci-openshift-assisted-service-master-edge-ci-index
pull-ci-openshift-assisted-service-master-edge-e2e-ai-operator-disconnected-capi
pull-ci-openshift-assisted-service-master-edge-e2e-ai-operator-ztp
pull-ci-openshift-assisted-service-master-edge-e2e-ai-operator-ztp-capi
pull-ci-openshift-assisted-service-master-edge-e2e-metal-assisted
pull-ci-openshift-assisted-service-master-edge-images
pull-ci-openshift-assisted-service-master-edge-lint
pull-ci-openshift-assisted-service-master-edge-subsystem-aws
pull-ci-openshift-assisted-service-master-edge-subsystem-kubeapi-aws
pull-ci-openshift-assisted-service-master-edge-unit-test
pull-ci-openshift-assisted-service-master-edge-verify-generated-code
pull-ci-openshift-assisted-service-master-images
pull-ci-openshift-assisted-service-master-mce-images
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: carbonin
The full list of commands accepted by this bot can be found here.
The pull request process is described here
/hold
Want to be able to run the remove node job, but it's only configured as a periodic. I'll fix that then run it from here.
Added the presubmit job in https://github.com/openshift/release/pull/55372
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 69.40%. Comparing base (
6dd2882
) to head (a8a9b4e
). Report is 12 commits behind head on master.
/test ?
@carbonin: The following commands are available to trigger required jobs:
/test e2e-agent-compact-ipv4
/test edge-assisted-operator-catalog-publish-verify
/test edge-ci-index
/test edge-e2e-ai-operator-ztp
/test edge-e2e-ai-operator-ztp-sno-day2-workers
/test edge-e2e-ai-operator-ztp-sno-day2-workers-late-binding
/test edge-e2e-metal-assisted
/test edge-e2e-metal-assisted-4-12
/test edge-e2e-metal-assisted-cnv-4-16
/test edge-e2e-metal-assisted-lvm
/test edge-e2e-metal-assisted-odf-4-16
/test edge-images
/test edge-lint
/test edge-subsystem-aws
/test edge-subsystem-kubeapi-aws
/test edge-unit-test
/test edge-verify-generated-code
/test images
/test mce-images
The following commands are available to trigger optional jobs:
/test e2e-agent-ha-dualstack
/test e2e-agent-sno-ipv6
/test edge-e2e-ai-operator-disconnected-capi
/test edge-e2e-ai-operator-ztp-3masters
/test edge-e2e-ai-operator-ztp-capi
/test edge-e2e-ai-operator-ztp-compact-day2-masters
/test edge-e2e-ai-operator-ztp-compact-day2-workers
/test edge-e2e-ai-operator-ztp-disconnected
/test edge-e2e-ai-operator-ztp-hypershift-zero-nodes
/test edge-e2e-ai-operator-ztp-multiarch-3masters-ocp
/test edge-e2e-ai-operator-ztp-multiarch-sno-ocp
/test edge-e2e-ai-operator-ztp-node-labels
/test edge-e2e-ai-operator-ztp-remove-node
/test edge-e2e-ai-operator-ztp-sno-day2-masters
/test edge-e2e-ai-operator-ztp-sno-day2-workers-ignitionoverride
/test edge-e2e-metal-assisted-4-13
/test edge-e2e-metal-assisted-4-14
/test edge-e2e-metal-assisted-4-15
/test edge-e2e-metal-assisted-4-16
/test edge-e2e-metal-assisted-bond
/test edge-e2e-metal-assisted-bond-4-14
/test edge-e2e-metal-assisted-day2
/test edge-e2e-metal-assisted-day2-arm-workers
/test edge-e2e-metal-assisted-day2-single-node
/test edge-e2e-metal-assisted-external
/test edge-e2e-metal-assisted-external-4-14
/test edge-e2e-metal-assisted-ipv4v6
/test edge-e2e-metal-assisted-ipv6
/test edge-e2e-metal-assisted-kube-api-late-binding-single-node
/test edge-e2e-metal-assisted-kube-api-late-unbinding-ipv4-single-node
/test edge-e2e-metal-assisted-kube-api-net-suite
/test edge-e2e-metal-assisted-mce-4-16
/test edge-e2e-metal-assisted-mce-sno-4-16
/test edge-e2e-metal-assisted-metallb
/test edge-e2e-metal-assisted-none
/test edge-e2e-metal-assisted-onprem
/test edge-e2e-metal-assisted-single-node
/test edge-e2e-metal-assisted-static-ip-suite
/test edge-e2e-metal-assisted-static-ip-suite-4-14
/test edge-e2e-metal-assisted-tang
/test edge-e2e-metal-assisted-tpmv2
/test edge-e2e-metal-assisted-upgrade-agent
/test edge-e2e-nutanix-assisted
/test edge-e2e-nutanix-assisted-2workers
/test edge-e2e-nutanix-assisted-4-14
/test edge-e2e-oci-assisted
/test edge-e2e-oci-assisted-4-14
/test edge-e2e-oci-assisted-iscsi
/test edge-e2e-vsphere-assisted
/test edge-e2e-vsphere-assisted-4-14
/test edge-e2e-vsphere-assisted-4-15
/test edge-e2e-vsphere-assisted-4-16
/test edge-e2e-vsphere-assisted-umn
/test okd-scos-images
/test push-pr-image
Use /test all
to run the following jobs that were automatically triggered:
pull-ci-openshift-assisted-service-master-e2e-agent-compact-ipv4
pull-ci-openshift-assisted-service-master-edge-ci-index
pull-ci-openshift-assisted-service-master-edge-e2e-ai-operator-disconnected-capi
pull-ci-openshift-assisted-service-master-edge-e2e-ai-operator-ztp
pull-ci-openshift-assisted-service-master-edge-e2e-ai-operator-ztp-capi
pull-ci-openshift-assisted-service-master-edge-e2e-metal-assisted
pull-ci-openshift-assisted-service-master-edge-images
pull-ci-openshift-assisted-service-master-edge-lint
pull-ci-openshift-assisted-service-master-edge-subsystem-aws
pull-ci-openshift-assisted-service-master-edge-subsystem-kubeapi-aws
pull-ci-openshift-assisted-service-master-edge-unit-test
pull-ci-openshift-assisted-service-master-edge-verify-generated-code
pull-ci-openshift-assisted-service-master-images
pull-ci-openshift-assisted-service-master-mce-images
/test edge-e2e-ai-operator-ztp-remove-node
/unhold
@eranco74 can you take a look at this one?
/retest-required
Remaining retests: 0 against base HEAD 61c279a23a9720d34a452befad7a6c8c3db11b80 and 2 for PR HEAD a8a9b4e31cfdb7c4ca61aaeacf371092467e0644 in total
/retest-required
/test edge-e2e-ai-operator-ztp
Failed during finalizing.
Events:
{
"cluster_id": "2d5a017d-dac0-4fb6-bee4-4a6a65665015",
"event_time": "2024-08-15T12:55:31.616Z",
"message": "Updated status of the cluster to finalizing",
"name": "cluster_status_updated",
"severity": "info"
},
{
"cluster_id": "2d5a017d-dac0-4fb6-bee4-4a6a65665015",
"event_time": "2024-08-15T12:56:01.518Z",
"message": "Updated finalizing stage of the cluster to 'Waiting for cluster operators'",
"name": "cluster_finalizing_stage_updated",
"severity": "info"
},
{
"cluster_id": "2d5a017d-dac0-4fb6-bee4-4a6a65665015",
"event_time": "2024-08-15T12:56:01.530Z",
"message": "Operator console status: progressing message: ",
"name": "cluster_operator_status",
"severity": "info"
},
{
"cluster_id": "2d5a017d-dac0-4fb6-bee4-4a6a65665015",
"event_time": "2024-08-15T12:56:01.539Z",
"message": "Operator cvo status: failed message: ",
"name": "cluster_operator_status",
"severity": "info"
},
Controller logs show CVO failure then connection refused from the API server:
time="2024-08-15T13:02:01Z" level=info msg="CVO status conditions: [{Type:RetrievedUpdates Status:False LastTransitionTime:2024-08-15 12:48:05 +0000 UTC Reason:VersionNotFound Message:Unable to retrieve available updates: currently reconciling cluster version 4.17.0-0.nightly-2024-08-13-031847 not found in the \"stable-4.17\" channel} {Type:ImplicitlyEnabledCapabilities Status:False LastTransitionTime:2024-08-15 12:48:05 +0000 UTC Reason:AsExpected Message:Capabilities match configured spec} {Type:ReleaseAccepted Status:True LastTransitionTime:2024-08-15 12:48:05 +0000 UTC Reason:PayloadLoaded Message:Payload loaded version=\"4.17.0-0.nightly-2024-08-13-031847\" image=\"registry.build03.ci.openshift.org/ci-op-nlt6g4kr/release@sha256:609cf659f5182ccd0e85987cde6fa234b68bdf6dab8729860d7413445d0e5dda\" architecture=\"amd64\"} {Type:Available Status:False LastTransitionTime:2024-08-15 12:48:05 +0000 UTC Reason: Message:} {Type:Failing Status:True LastTransitionTime:2024-08-15 13:00:03 +0000 UTC Reason:MultipleErrors Message:Multiple errors are preventing progress:\n* Cluster operators authentication, etcd, image-registry, ingress, kube-apiserver, kube-controller-manager, monitoring, openshift-apiserver, openshift-controller-manager, openshift-samples, operator-lifecycle-manager-packageserver are not available\n* Could not update imagestream \"openshift/driver-toolkit\" (612 of 900): the server is down or not responding\n* Could not update oauthclient \"console\" (546 of 900): the server does not recognize this resource, check extension API servers\n* Could not update role \"openshift-console-operator/prometheus-k8s\" (818 of 900): resource may have been deleted\n* Could not update role \"openshift-console/prometheus-k8s\" (822 of 900): resource may have been deleted} {Type:Progressing Status:True LastTransitionTime:2024-08-15 12:48:05 +0000 UTC Reason:MultipleErrors Message:Unable to apply 4.17.0-0.nightly-2024-08-13-031847: an unknown error has occurred: MultipleErrors}]" func=github.com/openshift/assisted-installer/src/assisted_installer_controller.ClusterVersionHandler.GetStatus file="/go/src/github.com/openshift/assisted-installer/src/assisted_installer_controller/operator_handler.go:128"
time="2024-08-15T13:03:01Z" level=error msg="Failed to check if console is enabled" func="github.com/openshift/assisted-installer/src/assisted_installer_controller.(*controller).waitingForClusterOperators.func1" file="/go/src/github.com/openshift/assisted-installer/src/assisted_installer_controller/assisted_installer_controller.go:1033" error="Get \"https://localhost:6443/apis/config.openshift.io/v1/clusterversions/version\": dial tcp [::1]:6443: connect: connection refused"
time="2024-08-15T13:04:01Z" level=info msg="Start uploading controller logs (intermediate snapshot)" func="github.com/openshift/assisted-installer/src/assisted_installer_controller.(*controller).UploadLogs" file="/go/src/github.com/openshift/assisted-installer/src/assisted_installer_controller/assisted_installer_controller.go:1464"
time="2024-08-15T13:04:01Z" level=info msg="Uploading logs for assisted-installer-controller-lzzk7 in assisted-installer" func=github.com/openshift/assisted-installer/src/common.UploadPodLogs file="/go/src/github.com/openshift/assisted-installer/src/common/common.go:136"
time="2024-08-15T13:04:01Z" level=error msg="Failed to get logs from kube-api, reading from file" func=github.com/openshift/assisted-installer/src/common.GetControllerPodLogs file="/go/src/github.com/openshift/assisted-installer/src/common/common.go:112" error="Get \"https://localhost:6443/api/v1/namespaces/assisted-installer/pods/assisted-installer-controller-lzzk7/log\": dial tcp [::1]:6443: connect: connection refused"
/retest-required
@carbonin: all tests passed!
Full PR test history. Your PR dashboard.
Only BMAC should be aware of the BMH and its status. When deprovisioning a node by deleting a BMH, make BMAC wait for the node to fully deprovision before annotating and deleting the agent and removing the finalizer.
This adds a new annotation for the agent resource (
agent.agent-install/clean-spoke-on-delete
) which is used in place of the bmac annotation to communicate to the agent controller that the node should be removed.This gets us a bit closer to https://issues.redhat.com/browse/MGMT-10006 by removing the BMH concept from the "remove a node" flow in the agent controller. This will allow this logic to be more easily reused in the late-binding/non-bmh case.
List all the issues related to this PR
https://issues.redhat.com/browse/MGMT-10006
What environments does this code impact?
How was this code tested?
Tested manually using a dev-scripts cluster.
bmac.agent-install.openshift.io/remove-agent-and-node-on-delete
and removed the BMHChecklist
docs
, README, etc)Reviewers Checklist