stepchowfun / toast

Containerize your development and continuous integration environments. 🥂
Other
1.56k stars 39 forks source link

Shared container/host download cache #447

Open lynaghk opened 1 year ago

lynaghk commented 1 year ago

My toast tasks spent a lot of time repeatedly downloading language dependencies into, e.g., ~/.cargo/registry (Rust) and ~/.m2/ (Java / Clojure). I've run into two problems trying to share these host dependency caches with containers.

Consider a task definition like:

image: ubuntu:22.04
tasks:
  compile-rust:
    dependencies:
      - install-rust
    mount_paths:
      - ~/.cargo/registry:/root/.cargo/registry
    command: |
      cargo build

Problem 1: Containers with mount_paths are not cachable.

Running toast compile-rust will fail with:

[ERROR] Unable to parse file /Users/dev/software/toast/scratch/toast.yml. Reason: Task compile-rust has mount_paths but does not disable caching. To fix this, set cache: false for this task.

I understand the intent here --- the contents of the mount_paths are untracked by toast and so in general caching build output could lead to reproducibility problems. However, for the specific need of sharing the host's already-downloaded dependencies, it feels a bit different. We're trusting language tooling (cargo, mvn, whatever) to handle downloading/validating libraries in a sensible way. Since toast is already fine caching containers that download arbitrary stuff from the Internet (i.e., isn't aiming for bitwise-reproducible dependency tracking), it feels reasonable to ignore some of the "inputs" for the purposes of caching.

Perhaps we should break out this need out as a separate concept, say cache_paths. Then we'd have:

Alternatively, we could allow specifying a "don't cache this" option for specific input paths or a "please allow caching" for containers that use mount_paths, but I think it'll be clearer to have an explicitly named concept.

Problem 2: Intended host paths vary.

I pulled a fast one with:

mount_paths:
  - ~/.cargo/registry:/root/.cargo/registry

It actually needs to be a path relative to the toast file (../../.cargo/registry) or an absolute one (/Users/kevin/.cargo/registry). Unfortunately, both make it difficult to make toast-driven projects portable, since it'll force you to checkout the project to a specific folder or run as a specific user.

The obvious fix in this case is to write a bit of code to expand ~ into the executing user's home directory, though there be more general ways to solve this need (e.g., allowing for interpretation of environment variables in toast config values beyond just the command string).

If you agree that the need is in-scope for toast, I'm happy to submit PRs in the next few days.

stepchowfun commented 1 year ago

Hi @lynaghk, thanks for these two feature requests! I can understand the desire for both. Here are my rough thoughts on them:

For Problem 1, one simple possibility is to change the error into a warning. I'm on the fence about it, though. Warnings tend to be ignored, although in this case that would be the point. If we changed this error into a warning, there are probably other similar errors we'd also want to change into warnings.

For Problem 2, I think supporting environment variable interpolation is the way to go, rather than treating home directories as a special case. You'd have to write $HOME instead of ~, but I think that's fine. However, I'm concerned about the complexity of adding this feature.

Sorry for the non-committal response. I'm still not sure about either.

lynaghk commented 1 year ago

No worries! There's no huge rush on my end, so definitely feel free to have a think.

Re: Problem 1, warnings don't feel like a long-term "solution", since they don't allow one to distinguish intent --- if a colleague had written the toast file, I wouldn't be sure if they'd made a mistake or if they deliberately wanted to introduce not-tracked-by-toast state into the container.

Re: Problem 2, the other tricky part of environment vars from a UX perspective is that the ones in command will variable will be interpreted within the container shell, whereas all of the other ones (presumably) will be interpreted w.r.t. the context that is running the toast command. But hopefully that'll make sense to everyone.

I already have a private fork of toast running Bollard (as discussed in https://github.com/stepchowfun/toast/issues/440#issuecomment-1200268212) so I may clean that up and have a crack at some of these other features.

I'll report on my experiences and if you want we can discuss how to merge some of the back upstream. Cheers!

stepchowfun commented 1 year ago

Those are good points!