Closed wrenkredhat2 closed 5 months ago
I did reboot master nodes with no luck - seems that no pods can be started due to the admission hook not responding. @rbo, we once again need your help. kubeconfig for client-auth access is on stormshiftdeploy in the usual dir.
Heads up @cluster/ocp3-admin - the "cluster/ocp3" label was applied to this issue.
I did deleted the very last volcano-admission-service and now im waiting for the sceduler to reconsildate automatically. If not I'll restart the masters gracefully. is this OK ?
now the console seems to be starting: 4s Normal SuccessfulCreate replicaset/oauth-openshift-5bb5f4f579 Created pod: oauth-openshift-5bb5f4f579-w6mp7 4s Normal SuccessfulCreate replicaset/oauth-openshift-5bb5f4f579 Created pod: oauth-openshift-5bb5f4f579-x8hj8
sure, the cluster is broken, feel free to restart masters as you like.
The problem is fixed now. How ?---- I deleted all validatingwebhookconfigurations and admissionwebhookconfigs starting with "volcano-...." As this did not automatically reconsile the cluster, i followed to restart the masters: https://access.redhat.com/solutions/6089061 Carefullly one by one. After a few minutes the console-login worked. admin-login worked, and the eventlog seem to be noḿal.
Hello all,
due to an Installation of a Vulcano.sh istance ocp3 did become inoperative.
The obvious incident is, that the login-page is not avalible anymore.
The issue did come in when the Vulcano-Instance became uninstalled, but the uninstall did not sweep the validating and mutating webhooks while the services assiciated with the processing have been removed.
I deleted those manually, however those still reside in the Memory.
Thefore i believe the masternodes have to be gracefully restarted or at least some pods:
The current errormessage is: m18s Warning FailedCreate replicaset/oauth-openshift-7b67db7d95 Error creating: Internal error occurred: failed calling webhook "validatepod.volcano.sh": failed to call webhook: Post "https://volcano-admission-service.wrenk-volcano-system.svc:443/pods/validate?timeout=10s": service "volcano-admission-service" not found
I deleted those webhooks but the cluster sill asks for the services to exist.
and i believe this is applicable for all pods for now.
the whole story you can find in: https://redhat-internal.slack.com/archives/C04J8QF8Y83/p1706100817959359
I want to apologizes for this Situation -- i should have tested the before in a sandboxenvironemt.
please reconcile this as i do not know excalty how to do this and i do not want to add more harm to it.
Thank you !
Wolfgang