nestybox / sysbox-pkgr

Sysbox-pkgr repository
5 stars 14 forks source link

Add detection for RKE2 controller service in daemonset installer #106

Closed detjensrobert closed 1 year ago

detjensrobert commented 1 year ago

Previously, only rke2-agent.service was used for detecting RKE2-based clusters, but in a single node cluster there is only the controller node with rke2-server.service running instead.

This adds support for checking for rke2-server.service and handles stopping/starting whichever service is running on the node.

Fixes https://github.com/nestybox/sysbox/issues/692.

detjensrobert commented 1 year ago

This does not perform any checks to only install onto the RKE2 controller node when running as a single node cluster -- I am assuming that the user can make this decision and label or not label the controller node as an install target as desired, since this is a non-interactive install process.

Is this a good assumption or should the controller node detection be gated behind e.g. an environment variable flag on the daemonset?

ctalledo commented 1 year ago

Hi @detjensrobert, looks good to me, thanks for the contribution.

Question: have you had a chance to test sysbox-deploy-k8s on RKE2 with these changes? (ideally on both a single-node and multi-node cluster)?

ctalledo commented 1 year ago

This does not perform any checks to only install onto the RKE2 controller node when running as a single node cluster -- I am assuming that the user can make this decision and label or not label the controller node as an install target as desired, since this is a non-interactive install process.

Is this a good assumption or should the controller node detection be gated behind e.g. an environment variable flag on the daemonset?

I think this is fine; the current process relies on the use labeling the nodes with sysbox-install=yes; users always need to label the nodes where they desired Sysbox to be installed.

detjensrobert commented 1 year ago

Tested with both a single-node and multi-node cluster. Aside from the downtime that comes with restarting the controller, things work.

Also added rke2-server detection to the uninstaller script as well. Would you like me to squash the commits together?

rodnymolina commented 1 year ago

Thanks a lot @detjensrobert, looks good at first glance. Will do a deeper review tomorrow.

ctalledo commented 1 year ago

Tested with both a single-node and multi-node cluster. Aside from the downtime that comes with restarting the controller, things work.

Sounds great, thanks.

Also added rke2-server detection to the uninstaller script as well. Would you like me to squash the commits together?

No need.

rodnymolina commented 1 year ago

My apologies for the delay in reviewing this one. I've been trying to figure out why I'm unable to stabilize my rke2 setup after installing Sysbox in single-node scenarios. I haven't found a good explanation for it yet, but I believe this is a limitation of rke itself, and has nothing to do with Sysbox or the changes in this PR.

NAMESPACE     NAME                                                        READY   STATUS             RESTARTS       AGE
kube-system   pod/cloud-controller-manager-ubuntu-focal-vm-4              1/1     Running            3              82m
kube-system   pod/etcd-ubuntu-focal-vm-4                                  0/1     CrashLoopBackOff   7 (111s ago)   81m
kube-system   pod/helm-install-rke2-canal-52hbk                           0/1     Completed          0              81m
kube-system   pod/helm-install-rke2-coredns-hl9vp                         0/1     Completed          0              81m
kube-system   pod/helm-install-rke2-ingress-nginx-xfb4f                   0/1     Completed          0              81m
kube-system   pod/helm-install-rke2-metrics-server-tnjsv                  0/1     Completed          0              81m
kube-system   pod/helm-install-rke2-snapshot-controller-crd-2r52v         0/1     Completed          0              81m
kube-system   pod/helm-install-rke2-snapshot-controller-rh6kq             0/1     Completed          1              81m
kube-system   pod/helm-install-rke2-snapshot-validation-webhook-7qvj5     0/1     Completed          0              81m
kube-system   pod/kube-apiserver-ubuntu-focal-vm-4                        0/1     CrashLoopBackOff   8 (96s ago)    82m
kube-system   pod/kube-controller-manager-ubuntu-focal-vm-4               1/1     Running            3              82m
kube-system   pod/kube-scheduler-ubuntu-focal-vm-4                        1/1     Running            2              82m
kube-system   pod/rke2-canal-w9p67                                        2/2     Running            2              81m
kube-system   pod/rke2-coredns-rke2-coredns-6b9548f79f-2vzcd              1/1     Running            3              81m
kube-system   pod/rke2-coredns-rke2-coredns-autoscaler-57647bc7cf-hjlhz   1/1     Running            4              81m
kube-system   pod/rke2-ingress-nginx-controller-2d4nq                     1/1     Running            2              80m
kube-system   pod/rke2-metrics-server-78b84fff48-xhqsw                    1/1     Running            4              80m
kube-system   pod/rke2-snapshot-controller-849d69c748-845gx               1/1     Running            3              80m
kube-system   pod/rke2-snapshot-validation-webhook-654f6677b-nf426        1/1     Running            2              80m
kube-system   pod/sysbox-deploy-k8s-4zp4b                                 1/1     Running            2              7m57s

Will go ahead and merge this one now. Thanks @detjensrobert for your contribution.