Open thangchung opened 1 month ago
interesting, were you able to ssh into the cluster node and check if the spin shim binary still exists in PATH or the contaienrd's config.toml still have the CRI config for the spin shim?
I'm seeing the same behavior. Indeed, when the (new?) node(s) come back up after the AKS stop/restart, they are missing the spin shim CRI config -- thus the SpinApp pods are stuck in ContainerCreating with failed to get sandbox runtime: no runtime for "spin" is configured
.
The current quick fix is to re-annotate node(s), eg via kubectl annotate node --all kwasm.sh/kwasm-node=true
. (Should not need to delete spinkube and re-install.) But the best resolution would be for AKS to preserve the containerd configuration through the stop/restart cycle.
I will reach out to the AKS team to find out the configuration issue
I will reach out to the AKS team to find out the configuration issue
Thanks, @Mossaka @vdice for acting on it. I'm waiting for https://github.com/spinkube/azure/issues/25.
This issue also affects Kubernetes clusters outside of Azure that have capabilities like horizontal cluster auto-scaling or scheduled node upgrades.
As an intermediate solution, I created a small DaemonSet that starts a Job to annotate the current Kubernetes node.
Although the solution isn't ideal, It guarantees that new nodes will be annotated with kwasm.sh/kwasm-node=true
.
@Mossaka I'm happy to polish my workaround and publish it on GitHub so that others will have a solution for this.
I followed the guidance in the
README
file. It worked very well.However, one issue that has happened to me is that if I stop the AKS cluster and restart it again, SpinApp (deployment) will be in spending status forever. See below
The logs: 104s Normal Scheduled pod/simple-spinapp-84c9b4885b-bf682 Successfully assigned default/simple-spinapp-84c9b4885b-bf682 to aks-nodepool1-18815957-vmss000001 12s Warning FailedCreatePodSandBox pod/simple-spinapp-84c9b4885b-bf682 Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox runtime: no runtime for "spin" is configured
I tried to delete it by using:
And
It was still not working.
The only way to make it work again is to use
helm delete spinkube
, and re-install it again on the AKS cluster.