Closed ghost closed 6 months ago
from
E1106 05:02:05.037486 1 controller.go:324] "Reconciler error" err="admission webhook \"validation.openstackmachine.infrastructure.cluster.x-k8s.io\" denied the request: OpenStackMachine.infrastructure.cluster.x-k8s.io \"test-migration-control-plane-pcklj\" is invalid: spec: Forbidden: cannot be modified" controller="openstackmachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="OpenStackMachine" OpenStackMachine="default/test-migration-control-plane-pcklj" namespace="default" name="test-migration-control-plane-pcklj" reconcileID=918ff92b-887e-44c1-946e-a0b7f393c098
looks like you are updating control plan and lead to admission reject the action while it's tring to update the spec
can you help describe the exact way you did? e.g delete VM from horizon then create from horizon with same ip/hostname etc
through UI ?
No delete on of master nodes from horizon after that a new one created automatically.
@jichenjc Any update?
Seeing same thing. I killed a worked node VM in horizon. A moment later CAPO spins up a new one, but it's unable to get through bootstrap. CAPO logs the same thing mentioned above in this ticket.
Based on the log message, the LoC has to be this one meaning the spec is being changed. That seems to be immutable by design, but I'm not sure if that is a mistake or if instead of update a delete-recreate should have been done.
Other than that I also noticed that in cloud-init preflight failed.
error execution phase preflight: couldn't validate the identity of the API Server: could not find a JWS signature in the cluster-info ConfigMap for token ID
I do believe this is just an endresult and what is going on is:
TLDR
Probably we should look into the logic that is trying to update the OpenstackMachine
rather than delete upon "deletion event" and create a new object -> reconcile that new object. Any hints regarding where to look for this logic are welcome.
Environment:
ping @jichenjc
@xirehat Would you be willing to rename the issue to something more generic, like "OpenstackMachine reconcile fails after a VM is deleted". I do not wish to "spam" with a new issue, but as I mentioned above, I'm experiencing basically the same and I believe we should look into the "why" is not the old OpenstackMachine
deleted and a new one created, but instead some controller is trying to update a by-design immutable spec
.
Hey @xirehat I'm still going to look into this but I thought this might help you. Please check out Healthchecking for more info.
After the VM is deleted, it gets recreated during the standard reconcile loop in the getOrCreate fnc. AFAIK there is no way (the way I understand it's designed) that the KubeadmConfig controller would be notified that the KubeadmConfig needs to be refreshed and even if it was, the OpenstackMachine's spec is immutable, so the new ID cannot be saved. This leads me to 2 conclussions:
Machine
that "owns" the OpenstackMachine
you should probably check MachineHealthCheck
as linked above, since that is exactly what it does.
TLDR; the flow with MachineHealthCheck can be following:
Machine
)hope that helps :-)
Thanks @strudelPi :100: This solution helps me, I defined a MHC to fix this issue.
@xirehat I opened up another issue where important facts for a fix are summarized. Would you mind closing this issue? :-)
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
/close
@strudelPi: Closing this issue.
/kind bug
What steps did you take and what happened: I removed one of master nodes from OpenStack Horizon dashboard ( I have 3 master nodes). After that another node was provisioned, but it couldn't join to Kubernetes cluster.
What did you expect to happen: No error should occur
Anything else you would like to add:
Environment:
git rev-parse HEAD
if manually built):kubectl version
): v1.25.2/etc/os-release
): Ubuntu 22.04