Closed jamonation closed 1 year ago
🤔 Thinking more about this, the 169.254.169.254
IP is used across cloud providers for instance metadata. e.g. Azure and AWS.
Since my is_gke
function is not parsing the response from the metadata server, the resulting check on another cloud provider would treat any response as a positive indicator that the node is part of a GKE cluster, whereas it could very well be running in EKS or AKS or any other environment that has a metadata server at that IP address.
The solution is to check the HTTP response code returns 200, which isn't as clean, but will at least ensure this logic only applies to GKE nodes (with Workload Identity enabled as noted of course). I'll update my PR when I have some time to work on this and test it.
Right, I've pushed an updated check_gke
function that looks for an HTTP 200 response from the metadata endpoint. This approach isn't the most robust as you noted @kevholmes but it is something to at least cover those clusters with workload identity turned on.
To test, I've done this in a Dockerfile and built/pushed a sysbox:v0.6.2-dev
image to my artifact registry:
FROM registry.nestybox.com/nestybox/sysbox-deploy-k8s:v0.6.2
COPY my-patched-sysbox-deploy-k8s.sh /opt/sysbox/scripts/sysbox-deploy-k8s.sh
Then edited the sysbox-deploy-k8s
DaemonSet to use my customised sysbox:v0.6.2-dev
image. So far so good!
Right, I've pushed an updated
check_gke
function that looks for an HTTP 200 response from the metadata endpoint. This approach isn't the most robust as you noted @kevholmes but it is something to at least cover those clusters with workload identity turned on.
Out of curiosity, have you had a chance to try on a non-GKE cluster? That would be excellent, but if you haven't we can do it.
FWIW, the fork's default branch is behind a could commits, so after checking out your branch ran git remote add upstream https://github.com/nestybox/sysbox-pkgr.git
, git fetch upstream
, and git rebase upstream/master
, then ran the make commands to build the image.
IMHO I do feel that something like curl -Ls -o /dev/null "http://metadata.google.internal/computeMetadata/v1/instance/image" -H "Metadata-Flavor: Google" && true || false
would be a bit more concise as oppsed to checking specific HTTP reponse codes against the IP. GCP docs reference the use of metadata.google.internal
, and I wouldn't that expect to resolve on other providers, so curl would return a non-zero exit code. But either certainly has the same result! :)
Thanks @yachub; let's go ahead and merge this PR then and close this other PR which does the same thing.
Here's an initial attempt at resolving https://github.com/nestybox/sysbox/issues/680.
The idea is the
sysbox-deploy-k8s.sh
script checks if the link local Google metadata endpoint is available at 169.254.169.254 and then to see if the host node is part of a GKE cluster.If so, then the script removes the conflicting bridge configuration, and sets the correct paths in the crio.network toml config.
Could definitely use some proper testing from someone who knows their way around the install process!