nestybox / sysbox

An open-source, next-generation "runc" that empowers rootless containers to run workloads such as Systemd, Docker, Kubernetes, just like VMs.
Apache License 2.0
2.8k stars 155 forks source link

AKS - Error while updating taint on the cluster #816

Open andresvasile opened 3 months ago

andresvasile commented 3 months ago

Hello guys I'm trying to install Sysbox in my AKS cluster and I got this error:

Error from server: admission webhook "aks-node-validating-webhook.azmk8s.io" denied the request: (UID: 9a9f6a3d-c7c3-4dcc-b73b-a22885904ebe) Label update/add request "sysbox-runtime:installing" refused. User is attempting to update/add a label configured on node pool. Please change node pool label configuration through Azure..

I saw different issues that mention that AKS modified their workaround, and it is no longer feasible to update the taints. I would like to know if you are aware of this or if it's a different workaround to manage Sysbox in AKS

cc: @ctalledo

Thanks.

ctalledo commented 3 months ago

Thanks @andresvasile.

I would like to know if you are aware of this or if it's a different workaround to manage Sysbox in AKS

Not aware of it, thanks for lettings us know.

I saw different https://github.com/Azure/AKS/issues/2934 that mention that AKS modified their workaround, and it is no longer feasible to update the taints.

Strange that they wouldn't allow a daemonset to set the labels though, I believe it's a fairly common practice.

When running the sysbox-deploy-k8s daemonset, what do the logs for the daemonset show? Maybe there's a further clue in there.

FYI, the sysbox-deploy-k8s daemonset runs a bash script that "drops" Sysbox onto the K8s node(s) and then reconfigures K8s to become aware of the new runtime. That script sets the labels using a function called add_label_to_nodes (see here for example).

andresvasile commented 3 months ago

Sure, here is the full log:

Detected Kubernetes version v1.29
Adding K8s taint "sysbox-runtime=not-running:NoSchedule" to node ...
node/aks-isolated-26818767-vmss000000 modified
Adding K8s label "crio-runtime=installing" to node ...
node/aks-isolated-26818767-vmss000000 not labeled
Deploying CRI-O installer agent on the host (v1.29) ...
Running CRI-O installer agent on the host (may take several seconds) ...
Removing CRI-O installer agent from the host ...
Configuring CRI-O ...
Configuring CRI-O for GKE
Adding K8s label "sysbox-runtime=installing" to node ...
Error from server: admission webhook "aks-node-validating-webhook.azmk8s.io" denied the request: (UID: c80b6ee7-8444-4752-b5f3-963a6db70b09) Label update/add request "sysbox-runtime:installing" refused. User is attempting to update/add a label configured on node pool. Please change node pool label configuration through Azure..
andresvasile commented 3 months ago

@ctalledo for now I'm using this particular command to bypass the validation:

kubectl delete ValidatingWebhookConfiguration aks-node-validating-webhook

but I'm not really sure what concerns or implications could bring it..

ctalledo commented 1 month ago

Thanks @andresvasile for the work-around above to install Sysbox on AKS.

I guess we could improve the sysbox-deploy-k8s daemonset to detect it's installing on AKS and try to label the node, but if it's unsuccessful (e.g., because AKS restricts this), then just move on instead of failing.

IIRC, sysbox-deploy-k8s uses those labels to keep track of it's progress and it installs Sysbox on the K8s cluster, as well as to prevent sysbox containers from getting deployed on a node until Sysbox is actually installed and running on it.