yarnpkg / berry

📦🐈 Active development trunk for Yarn ⚒
https://yarnpkg.com
BSD 2-Clause "Simplified" License
7.42k stars 1.11k forks source link

[Feature] Download and cache dependencies from yarn.lock lockfile #5998

Open GauBen opened 10 months ago

GauBen commented 10 months ago

Describe the user story

Creating a Docker image of a Yarn-managed monorepo is not really efficient regarding layer caching. As of now there seems to be two possibilities:

# Copy the whole monorepo and do everything at once
COPY . .
RUN yarn install && yarn build

This will download the dependencies every time a file is changed

# Copy only package files
COPY package.json yarn.lock .
COPY packages/a/package.json ./packages/a
COPY packages/b/package.json ./packages/b
COPY packages/c/package.json ./packages/c
COPY packages/d/package.json ./packages/d
# ... endless and hand-maintained list
RUN yarn install

# Build everything
COPY . .
RUN yarn build

This will properly leverage layer caching but requires hand-maintenance of the docker file

Describe the solution you'd like

# Copy only the lock file
COPY yarn.lock .
RUN yarn cache download

# Create node_modules and build everything
COPY . .
RUN YARN_ENABLE_OFFLINE_MODE=1 yarn install && yarn build

This will only re-download the dependencies if yarn.lock is changed, which seems to offer the best of the solutions above, without the hassle

Reference: https://pnpm.io/cli/fetch

Describe the drawbacks of your solution

This solution builds on the current yarn cache clean command, suggesting creating new cache management commands (e.g. download, diff, audit...)

Describe alternatives you've considered

Cache commands should probably all be in yarn core, but this could definitely be a plugin if it can be implemented as a plugin

merceyz commented 10 months ago

You can avoid downloading all the packages anew by mounting a cache for the install step

RUN \
  --mount=type=cache,target=/root/.yarn \
  yarn

With this cache adding a new dependency will only fetch that one dependency.

GauBen commented 10 months ago

Thanks @merceyz for the advice, I will take a look. Relying on docker build cache rather than the host machine cache would probably be easier for my use case, but both should be possible

GauBen commented 1 month ago

I'm back with a prototype!

https://gist.github.com/GauBen/8a9847abfbee0138ed2c5fa04812a500

It's currently very experimental but works for my use case

merceyz: The Project class has a fetchEverything function you might be able to use there. https://github.com/yarnpkg/berry/blob/2a4786715871e2febd03113249318637650aa5ab/packages/yarnpkg-core/sources/Project.ts#L1095-L1208