nestybox / sysbox-pkgr

Sysbox-pkgr repository
5 stars 14 forks source link

Initial attempt at GKE detection in sysbox-deploy-k8s script #108

Closed jamonation closed 1 year ago

jamonation commented 1 year ago

Here's an initial attempt at resolving https://github.com/nestybox/sysbox/issues/680.

The idea is the sysbox-deploy-k8s.sh script checks if the link local Google metadata endpoint is available at 169.254.169.254 and then to see if the host node is part of a GKE cluster.

If so, then the script removes the conflicting bridge configuration, and sets the correct paths in the crio.network toml config.

Could definitely use some proper testing from someone who knows their way around the install process!

jamonation commented 1 year ago

🤔 Thinking more about this, the 169.254.169.254 IP is used across cloud providers for instance metadata. e.g. Azure and AWS.

Since my is_gke function is not parsing the response from the metadata server, the resulting check on another cloud provider would treat any response as a positive indicator that the node is part of a GKE cluster, whereas it could very well be running in EKS or AKS or any other environment that has a metadata server at that IP address.

The solution is to check the HTTP response code returns 200, which isn't as clean, but will at least ensure this logic only applies to GKE nodes (with Workload Identity enabled as noted of course). I'll update my PR when I have some time to work on this and test it.

jamonation commented 1 year ago

Right, I've pushed an updated check_gke function that looks for an HTTP 200 response from the metadata endpoint. This approach isn't the most robust as you noted @kevholmes but it is something to at least cover those clusters with workload identity turned on.

To test, I've done this in a Dockerfile and built/pushed a sysbox:v0.6.2-dev image to my artifact registry:

FROM registry.nestybox.com/nestybox/sysbox-deploy-k8s:v0.6.2

COPY my-patched-sysbox-deploy-k8s.sh /opt/sysbox/scripts/sysbox-deploy-k8s.sh

Then edited the sysbox-deploy-k8s DaemonSet to use my customised sysbox:v0.6.2-dev image. So far so good!

ctalledo commented 1 year ago

Right, I've pushed an updated check_gke function that looks for an HTTP 200 response from the metadata endpoint. This approach isn't the most robust as you noted @kevholmes but it is something to at least cover those clusters with workload identity turned on.

Out of curiosity, have you had a chance to try on a non-GKE cluster? That would be excellent, but if you haven't we can do it.

yachub commented 1 year ago

FWIW, the fork's default branch is behind a could commits, so after checking out your branch ran git remote add upstream https://github.com/nestybox/sysbox-pkgr.git, git fetch upstream, and git rebase upstream/master, then ran the make commands to build the image.

IMHO I do feel that something like curl -Ls -o /dev/null "http://metadata.google.internal/computeMetadata/v1/instance/image" -H "Metadata-Flavor: Google" && true || false would be a bit more concise as oppsed to checking specific HTTP reponse codes against the IP. GCP docs reference the use of metadata.google.internal, and I wouldn't that expect to resolve on other providers, so curl would return a non-zero exit code. But either certainly has the same result! :)

ctalledo commented 1 year ago

Thanks @yachub; let's go ahead and merge this PR then and close this other PR which does the same thing.