kestra-io / kestra

Infinitely scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.
https://kestra.io
Apache License 2.0
7.06k stars 413 forks source link

Implement Docker Hub Proxy for Docker Image Caching #3315

Open slamer59 opened 3 months ago

slamer59 commented 3 months ago

Feature description

In a CI/CD process like Kestra, it's common to encounter Docker Hub rate limits when pulling Docker images repeatedly during builds. To mitigate this issue and improve build performance, Docker Hub proxy feature within Kestra could improve performance and remove this rate limiting.

Problem Statement

Currently, CI/CD workflows relying on Docker images from Docker Hub (or any other) can face rate limits, causing delays and disruptions in build processes. This limitation hinders the scalability and efficiency of automated builds, especially in large-scale projects with frequent image pulls.

Ex for anonymous user : 100 pulls per 6 hours per IP address

Proposed Solution

Integrate a Docker Hub proxy mechanism within Kestra that allows caching Docker images locally. This proxy should intelligently manage image requests, reducing the need for repetitive pulls from Docker Hub and optimizing build times.

Benefits

  1. Improved Build Performance: By caching Docker images locally, CI/CD builds will experience faster execution times as they won't be dependent on external Docker Hub requests for every build.
  2. Reduced Dependency on External Sources: Minimizing direct dependencies on Docker Hub reduces the impact of rate limits and network latency, enhancing the reliability of CI/CD pipelines.
  3. Scalability: The Docker Hub proxy feature enables seamless scaling of CI/CD infrastructure without concerns about exceeding Docker Hub rate limits.
  4. Enhanced Developer Experience: Developers can focus on coding and testing without being hindered by external service limitations, leading to a smoother development workflow.

Implementation Considerations

  1. Proxy Configuration: Provide a straightforward configuration option within Kestra's settings to define the Docker Hub proxy.
  2. Cache Management: Implement intelligent caching strategies to manage cached Docker images efficiently, considering expiration policies and cache invalidation mechanisms.
  3. Logging and Monitoring: Include logging and monitoring capabilities to track proxy activity, cache hits/misses, and overall performance metrics.
  4. Documentation: Ensure comprehensive documentation detailing how users can configure and utilize the Docker Hub proxy feature in their CI/CD workflows.

Use Case

Consider a scenario where a CI/CD pipeline in Kestra regularly pulls Docker images from Docker Hub for building and testing applications. With the Docker Hub proxy feature enabled, these images are cached locally, significantly reducing build times and improving overall pipeline efficiency.

This feature request aims to enhance the functionality and performance of Kestra in CI/CD environments, providing users with a seamless Docker image caching solution to optimize build workflows.

Relevant Documentation

  1. Docker setup Proxy
  2. Gitlab to setup proxy (in a gitlab CI/CD)
  3. harbor proxy setup
kriko commented 3 months ago

For organizations that have an existing DockerHub repository mirror set up, then they can use that by specifying the mirror in the image name. However, this works only for unauthenticated internal registry mirrors.

There are a few options:

I am not sure if bundling Docker registry pull through cache is that important, since there are several alternatives - blog post or you have a GitLab installation, then GitLab provides Dependency Proxy.

kriko commented 3 months ago

Additional information regarding authentication against private repository mirrors. This only works, if the image specified is referenced through the mirror. eg: mirror.company.com/image:tag, if the image is plainly referenced as image:tag then repository mirror works only if it's unauthenticated. This seems to be a limitation of dockerd and is still an unresolved issue. Authentication without speficying the mirror in the image name only works for authenticated DockerHub accounts (eg. paid accounts).