loft-sh / devpod

Codespaces but open-source, client-only and unopinionated: Works with any IDE and lets you use any cloud, kubernetes or just localhost docker.
https://devpod.sh
Mozilla Public License 2.0
8.74k stars 328 forks source link

Pod 823/cache #1245

Closed bkneis closed 2 weeks ago

bkneis commented 4 weeks ago

This PR contains the changes to facilitate the use of remote caching via a registry to speed up build times. It supports docker and kubernetes (via kaniko) and uses the context options REGISTRY_CACHE as the registry url. Caching options are not exposed currently in order to ensure the expected functionality.

The PR was tested using the following workflow:

Without cache

  1. Build examples/build 2m35s
  2. Add ghcr.io/devcontainers/features/github-cli:1 as a feature to devcontainer.json 2m45s
  3. Add files causing the build context to change (echo "test" > examples/build/app/test) 2m52s

With cache

  1. Build examples/build 2m13s
  2. Add ghcr.io/devcontainers/features/github-cli:1 as a feature to devcontainer.json 1m25s
  3. Add files causing the build context to change (echo "test" > examples/build/app/test) 13s

As you can see step 1 is almost the same, since no cache has been made yet. Step 2 however is interesting, we saved around 50% of the build time with the cache but 1m16s was used to upload the cache back to the registry. This shows that it is important for us to either omit the --cache-to parameter for up (but not build), or suppress pushing of the cache manifest until the end of the command in the background once the workspace / IDE has already launched.

Also note that I am using my local docker / kind cluster and a remote registry, when the registry is closer to the devcontainer I would expect even greater savings in build time due to shorter download times.

Lastly we have step 3 that simulates a common problematic workflow, which is a user updating the build context then uping a workspace. Here we see significant savings where only building the last layer is needed, instead of the entire image due to a cache miss.

EDIT: I have now implemented a boolean ExportCache in toggle the --cache-to parameter, we now only upload the cache when running build, not up. This provides up even faster start times of workspaces as we don't wait until the cache is uploaded.

Also note for machine providers I needed to enable the containerd snapshotter flag in the docker daemon, this was done during initWorkspace. For non machine providers like local docker we expect the user to enable this themselves. When trying to use the remote cache without this a WARNING will be printed from docker.

NOTE: We need to merge https://github.com/loft-sh/dockerless/pull/27 first and update the dockerless release tag in single.go

Lastly, I have updated the CalculatePrebuildHash function to traverse the parsed Dockerfile and extract any file paths that could affect the build context. I then filter for these files as an "include" list before adding the path to the hashed contents. This should cause fewer cache misses and allow developers to reuse similar images.

janekbaraniewski commented 3 weeks ago

LGTM!

pascalbreuninger commented 3 weeks ago

@bkneis approved with comments, feel free to ignore if not applicable

bkneis commented 3 weeks ago

@pascalbreuninger fab thanks! Just finishing testing with a remote k8s cluster on GKE then it should be ready to merge. Do I need to update any file to reference the new version of the kubernetes driver? i.e. https://github.com/loft-sh/devpod-provider-kubernetes/pull/55

pascalbreuninger commented 3 weeks ago

@bkneis nope, that's done by releasing a new version of the kubernetes provider over in the other repo👍 We'll need to wait until we've released it though, right?

bkneis commented 2 weeks ago

@pascalbreuninger just finished testing with GKE :partying_face: It was my auto pilot cluster giving me issues and killing workspaces with a return code 137 (OOM). Using the new GKE cluster workspaces are spinning up no problem and using the cache. In the end I didn't need to make any changes to the kubernetes driver, only dockerless so this PR is good to go IMO