neondatabase / autoscaling

Postgres vertical autoscaling in k8s
Apache License 2.0
142 stars 16 forks source link

Docker rate limiting causes e2e-tests to fail #975

Closed sharnoff closed 1 week ago

sharnoff commented 2 weeks ago

Problem

We sometimes get rate limited by docker in the e2e tests. When this happens, image pulls fail - and therefore the entire e2e test job fails as a result.

As a recent example, I saw a couple cases where deploying the components failed with:

Waiting for daemon set "neonvm-device-plugin" rollout to finish: 0 of 3 updated pods are available...
Error: The action 'deploy components' has timed out after 3 minutes.

and when looking at the events, we see:

LAST SEEN   TYPE      REASON           OBJECT                           MESSAGE
2m50s       Normal    Scheduled        pod/neonvm-device-plugin-blskl   Successfully assigned neonvm-system/neonvm-device-plugin-blskl to k3d-neonvm-agent-0
2m49s       Normal    AddedInterface   pod/neonvm-device-plugin-blskl   Add eth0 [10.0.0.154/32] from cilium
75s         Normal    Pulling          pod/neonvm-device-plugin-blskl   Pulling image "squat/generic-device-plugin"
72s         Warning   Failed           pod/neonvm-device-plugin-blskl   Failed to pull image "squat/generic-device-plugin": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/squat/generic-device-plugin:latest": failed to copy: httpReadSeeker: failed open: unexpected status code https://registry-1.docker.io/v2/squat/generic-device-plugin/manifests/sha256:ba6f0b4cf6c858d6ad29ba4d32e4da11638abbc7d96436bf04f582a97b2b8821: 429 Too Many Requests - Server message: toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit
72s         Warning   Failed           pod/neonvm-device-plugin-blskl   Error: ErrImagePull
57s         Warning   Failed           pod/neonvm-device-plugin-blskl   Error: ImagePullBackOff
45s         Normal    BackOff          pod/neonvm-device-plugin-blskl   Back-off pulling image "squat/generic-device-plugin"

ref

Potential solutions

Maybe we can specify credentials for dockerhub with this registries configuration file? https://k3d.io/v5.6.0/usage/registries/

We might also want to look into implementing this for kind, but that's lower priority because we aren't regularly using it in CI.