Closed Nicklason closed 1 year ago
First of all thank you for the report and the detailed steps to reproduce this. We will take a look tomorrow.
Hi @Nicklason, this looks like it relates to a known limitation of the minikube storage implementation. I found https://github.com/kubernetes/minikube/issues/12360 which sounds very related. Apparently the storage implementation in minikube is really simple and may not work for edge cases like multi node clusters.
I was able to reproduce your issue and tried the fix mentioned in a comment and that seemed to fix it for me.
Basically, before deploying your ZooKeeper object run the following:
minikube addons disable storage-provisioner
minikube addons disable default-storageclass
minikube addons enable volumesnapshots
minikube addons enable csi-hostpath-driver
kubectl patch storageclass csi-hostpath-sc -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
One thing to be aware of, for me the csi driver bound the volume to the cordoned node, which had the pod stuck in "pending" because no node was available to schedule it to. Deleting the PVC fixed that for me, as it was recreated and put on the non cordoned node.
If this fixes it we should document this.
In principle I agree, and for this specific thing happy to add it somewhere, however this will probably be the start of an entire section "known shortcomings of various kubernetes distros" in our documentation that can easily become a bottomless pit :)
Yeah a section like that could help but will be hard to maintain.... What about we rather adapt the issue template and add a hint to e.g. "Did you try to reproduce the issue locally on Kind / K3s?" since i assume this is what we mostly use locally to test? Then its easier to determine if its a bug from our side or any kubernetes distro?
We already have this for other distros: https://docs.stackable.tech/home/nightly/secret-operator/installation.html#_huawei_cloud
@soenkeliebau Thanks a lot for the quick help. I just followed the steps you provided and that resulted in the pod starting properly. Just as a note the patch command to make the csi provisioner storage class the default did not work, looks like GitHub formatted it as a link.
@soenkeliebau Thanks a lot for the quick help. I just followed the steps you provided and that resulted in the pod starting properly. Just as a note the patch command to make the csi provisioner storage class the default did not work, looks like GitHub formatted it as a link.
Thanks for the hint with the github link @Nicklason. I adapted @soenkeliebau comment. Closing this.
Affected version
23.7.0
Current and expected behavior
Steps to reproduce:
minikube start
kubectl cordon minikube
minikube node add
Init:CrashLoopBackOff
I apply a basic ZookeeperCluster resource. It creates one pod but the init container crashes and the pod is stuck in
Init:CrashLoopBackOff
.Possible solution
If the zookeeper pod runs on the "minikube" node (the controlplane node) then it works, but if it runs on a worker node then it does not work.
This issue may be related to #357.
Additional context
ZooKeeper cluster:
Error:
Environment
minikube cluster using docker driver with one control plane node and one worker node.
Kubernetes v1.27.4 minikube: v1.31.2 zookeeper-operator: v23.7.0
Would you like to work on fixing this bug?
None