rust-lang / infra-team

Coordination repository for the Rust infra team
https://www.rust-lang.org/governance/teams/infra
Apache License 2.0
20 stars 10 forks source link

Figure out docker hub caching #176

Open ehuss opened 2 hours ago

ehuss commented 2 hours ago

In GitHub Actions we periodically have problems hitting the Docker Hub rate limit which was introduced in November 2020 (error is "429 Too Many Requests"). This hits any repo using Docker (such as rust-lang/rust, and rust-lang/cargo).

The anonymous Docker Hub rate limit is 100 pulls / 6 hours / IP. source

There have been a few solutions proposed:

  1. Authenticate with Docker Hub. This changes the rate limit to 200 pulls / 6 hours / account. I do not know if that is sufficient for our needs across the org. I get the impression that infra team members do not like this option.
  2. Mirror in GitHub Container registry (docs). From what I can tell, there doesn't seem to be a read limit. There is a 10GB/layer limit, which I think is fine. The main drawback is that it requires manually updating the images (like when new Ubuntu images are released).
    • I do not think we update the base images very often. I do not know if the infra team has a pre-existing mechanism for uploading, or how difficult that is to do manually.
  3. Use Amazon ECR. I think the ECR Public Gallery has many of the images we typically use. I am uncertain, but I think the unauthenticated rate limit is 500GB/month (per IP?) source. We could authenticate using OIDC, which raises the cap to 5 TB / month.

I do not know what the performance and reliability compares between ghcr and amazon ecr.

Would the infra team have a preference here? I prefer whatever is easiest 😜. ghcr seems appealing to me if the infra team is ok with handling uploading new images.

Mark-Simulacrum commented 2 hours ago

Authenticate with Docker Hub. This changes the rate limit to 200 pulls / 6 hours / account. I do not know if that is sufficient for our needs across the org. I get the impression that infra team members do not like this option.

Yeah, I'd prefer to avoid ~personal/team accounts on Docker Hub, seems like unnecessary hassle, and 200 pulls / 6 hours also doesn't feel that high that this fully solves the problem.

Use Amazon ECR. I think the ECR Public Gallery has many of the images we typically use. I am uncertain, but I think the unauthenticated rate limit is 500GB/month (per IP?) source. We could authenticate using OIDC, which raises the cap to 5 TB / month.

It is per IP: " *** Data transferred out from public repositories is limited by source IP when an AWS account is not used." (https://aws.amazon.com/ecr/pricing/)

5 TB isn't a cap, it's just the free tier. Past that we start paying, but I'd expect that in practice we wouldn't use much beyond 5 TB (if at all, that's a pretty large amount of data).

For ECR, if it's not in the existing public gallery, we could probably configure pull through caching (https://docs.aws.amazon.com/AmazonECR/latest/userguide/pull-through-cache.html), though it sounds like that would require authentication. I'm much more comfortable with not ending up needing multiple Docker hub accounts to distribute load (as seems likely if e.g. rust-lang/rust uses this).

I do not think we update the base images very often. I do not know if the infra team has a pre-existing mechanism for uploading, or how difficult that is to do manually.

I think base images get updated pretty regularly? At least I'd expect that e.g. ubuntu:22.04 is getting updates constantly -- it was updated just 8 days ago (way after initial release) https://hub.docker.com/layers/library/ubuntu/22.04/images/sha256-3d1556a8a18cf5307b121e0a98e93f1ddf1f3f8e092f1fddfd941254785b95d7?context=explore