After migrating to AR, when our GCP-Auth addon is enabled with the mock credentials we use for testing, attempting to pull images from gcr.io/k8s-minikube result in unauthorized: authentication failed errors.
Reproduction
$ export GOOGLE_APPLICATION_CREDENTIALS="/Users/<user>/repo/minikube/test/integration/testdata/gcp-creds.json"
$ export GOOGLE_CLOUD_PROJECT="this_is_fake"
$ export MOCK_GOOGLE_TOKEN="true"
$ minikube start --addons gcp-auth
😄 minikube v1.34.0 on Darwin 14.7 (arm64)
✨ Automatically selected the docker driver. Other choices: qemu2, ssh, vfkit (experimental)
📌 Using Docker Desktop driver with root privileges
👍 Starting "minikube" primary control-plane node in "minikube" cluster
🚜 Pulling base image v0.0.45-1727108449-19696 ...
🔥 Creating docker container (CPUs=2, Memory=4000MB) ...
🐳 Preparing Kubernetes v1.31.1 on Docker 27.3.1 ...
▪ Generating certificates and keys ...
▪ Booting up control plane ...
▪ Configuring RBAC rules ...
🔗 Configuring bridge CNI (Container Networking Interface) ...
🔎 Verifying Kubernetes components...
▪ Using image gcr.io/k8s-minikube/storage-provisioner:v5
▪ Using image registry.k8s.io/ingress-nginx/kube-webhook-certgen:v1.4.3
▪ Using image gcr.io/k8s-minikube/gcp-auth-webhook:v0.1.2
🔎 Verifying gcp-auth addon...
📌 Your GCP credentials will now be mounted into every pod created in the minikube cluster.
📌 If you don't want your credentials mounted into a specific pod, add a label with the `gcp-auth-skip-secret` key to your pod configuration.
📌 If you want existing pods to be mounted with credentials, either recreate them or rerun addons enable with --refresh.
🌟 Enabled addons: storage-provisioner, default-storageclass, gcp-auth
🏄 Done! kubectl is now configured to use "minikube" cluster and "default" namespace by default
$ kubectl run --rm registry-test --restart=Never --image=gcr.io/k8s-minikube/busybox -it -- sh -c "wget --spider -S http://registry.kube-system.svc.cluster.local"
pod "registry-test" deleted
error: timed out waiting for the condition
$ kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
default registry-test 0/1 ErrImagePull 0 5s
$ kubectl describe pods registry-test
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 15s default-scheduler Successfully assigned default/registry-test to minikube
Normal BackOff 13s kubelet Back-off pulling image "gcr.io/k8s-minikube/busybox"
Warning Failed 13s kubelet Error: ImagePullBackOff
Normal Pulling 3s (x2 over 15s) kubelet Pulling image "gcr.io/k8s-minikube/busybox"
Warning Failed 2s (x2 over 14s) kubelet Failed to pull image "gcr.io/k8s-minikube/busybox": Error response from daemon: Head "https://gcr.io/v2/k8s-minikube/busybox/manifests/latest": unauthorized: authentication failed
Warning Failed 2s (x2 over 14s) kubelet Error: ErrImagePull
How does this affect the registry test?
Multiple factors come into play, first off, as mentioned in the timeline above, a month before the migration to AR, the GCP-Auth test was moved before the rest of the tests. Second, the GCP-Auth test also tries to pull a busybox image from gcr.io/k8s-minikube, but fails from the issue mentioned above, resulting in a call to t.Fataland in turn the GCP-Auth addon is not disabled. So when the registry test runs, the GCP-Auth addon with mock credentials is still running, causing the command the registry test tries to execute that pulls a busybox image to fail.
Correct, looking at the above gopogh output there seems to be no failures in GCP-Auth, but it is actually failing and the failure is being suppressed, looking at the raw JSON logs I found the following.
{"Time":"2024-08-27T23:15:10.547073291Z","Action":"output","Test":"TestAddons/serial/GCPAuth","Output":" addons_test.go:704: (dbg) TestAddons/serial/GCPAuth: waiting 8m0s for pods matching \"integration-test=busybox\" in namespace \"default\" ...\n"}
{"Time":"2024-08-27T23:15:10.550185311Z","Action":"output","Test":"TestAddons/serial/GCPAuth","Output":" helpers_test.go:344: \"busybox\" [3c0f1b89-73c9-47ff-b180-16b49b9cb882] Pending / Ready:ContainersNotReady (containers with unready status: [busybox]) / ContainersReady:ContainersNotReady (containers with unready status: [busybox])\n"}
{"Time":"2024-08-27T23:23:10.5476429Z","Action":"output","Test":"TestAddons/serial/GCPAuth","Output":" helpers_test.go:329: TestAddons/serial/GCPAuth: WARNING: pod list for \"default\" \"integration-test=busybox\" returned: client rate limiter Wait returned an error: context deadline exceeded\n"}
{"Time":"2024-08-27T23:23:10.54769183Z","Action":"output","Test":"TestAddons/serial/GCPAuth","Output":" addons_test.go:704: ***** TestAddons/serial/GCPAuth: pod \"integration-test=busybox\" failed to start within 8m0s: context deadline exceeded ****\n"}
{"Time":"2024-08-27T23:23:10.547702276Z","Action":"output","Test":"TestAddons/serial/GCPAuth","Output":" addons_test.go:704: (dbg) Run: out/minikube-linux-amd64 status --format={{.APIServer}} -p addons-029048 -n addons-029048\n"}
{"Time":"2024-08-27T23:23:10.84226332Z","Action":"output","Test":"TestAddons/serial/GCPAuth","Output":" addons_test.go:704: TestAddons/serial/GCPAuth: showing logs for failed pods as of 2024-08-27 23:23:10.842143419 +0000 UTC m=+743.194308226\n"}
{"Time":"2024-08-27T23:23:10.842291126Z","Action":"output","Test":"TestAddons/serial/GCPAuth","Output":" addons_test.go:704: (dbg) Run: kubectl --context addons-029048 describe po busybox -n default\n"}
{"Time":"2024-08-27T23:23:10.909691302Z","Action":"output","Test":"TestAddons/serial/GCPAuth","Output":" addons_test.go:704: (dbg) kubectl --context addons-029048 describe po busybox -n default:\n"}
This is due to how gopogh handles parent tests, if a test has any child the test result is suppressed, this is done to prevent a single child test failure from showing a failure for each parent in the chain and mucking up the output. ie. If two tests failed that each had four parents, gopogh only shows two failures instead of 10. The GCP-Auth test has a child test Namespaces, so any failure in the GCP-Auth test itself is suppressed.
Timeline
Jul 25, 2024
Moved the serial addon tests (including GCP-Auth) to run before the parallel addon testsAug 27, 2024
Migrated GCP image hosting from GCR to Artifact Registry (AR)Root Cause
After migrating to AR, when our GCP-Auth addon is enabled with the mock credentials we use for testing, attempting to pull images from
gcr.io/k8s-minikube
result inunauthorized: authentication failed
errors.Reproduction
How does this affect the registry test?
Multiple factors come into play, first off, as mentioned in the timeline above, a month before the migration to AR, the GCP-Auth test was moved before the rest of the tests. Second, the GCP-Auth test also tries to pull a busybox image from
gcr.io/k8s-minikube
, but fails from the issue mentioned above, resulting in a call tot.Fatal
and in turn the GCP-Auth addon is not disabled. So when the registry test runs, the GCP-Auth addon with mock credentials is still running, causing the command the registry test tries to execute that pulls a busybox image to fail.I don't see the GCP-Auth test failing though
https://storage.googleapis.com/minikube-builds/logs/master/35974/Docker_Linux.html
Correct, looking at the above gopogh output there seems to be no failures in GCP-Auth, but it is actually failing and the failure is being suppressed, looking at the raw JSON logs I found the following.
This is due to how gopogh handles parent tests, if a test has any child the test result is suppressed, this is done to prevent a single child test failure from showing a failure for each parent in the chain and mucking up the output. ie. If two tests failed that each had four parents, gopogh only shows two failures instead of 10. The GCP-Auth test has a child test
Namespaces
, so any failure in the GCP-Auth test itself is suppressed.Action Items