yarnpkg / berry

📦🐈 Active development trunk for Yarn ⚒
https://yarnpkg.com
BSD 2-Clause "Simplified" License
7.23k stars 1.07k forks source link

Individual lockfile per workspace #1223

Open migueloller opened 4 years ago

migueloller commented 4 years ago

Describe the user story

I believe the best user story are the comments in this issue: https://github.com/yarnpkg/yarn/issues/5428

Describe the solution you'd like

Add support for generating a lockfile for each workspace. It could be enabled via configuration option like lockfilePerWorkspace?

Describe the drawbacks of your solution

As this would be opt-in and follow well known semantics, I'm not sure there are any drawbacks to implementing the feature. One could argue if it should be in the core or implemented as a plugin. I personally don't have a preference of one over the other.

Describe alternatives you've considered

This could be implemented in the existing workspace-tools plugin. I'm unsure if the hooks provided by Yarn would allow for this, though.

Additional context

https://github.com/yarnpkg/yarn/issues/5428

arcanis commented 4 years ago

I believe the best user story are the comments in this issue: yarnpkg/yarn#5428

It's best to summarize it - there are a lot of discussions there 🙂

I've seen your comment about cache layers, but I wonder if what you're looking for isn't just a way to compute a "cache key" for a given workspace?

migueloller commented 4 years ago

I'll gladly do that 😄

Current state of affairs

Yarn workspaces has a lot of benefits for monorepos, one of them being the ability to hoist third-party dependencies to reduce installation times and disk space consumed. This works by picking a dependency version that fits the most dependency requirements as specified by package manifest files. If a single dependency can't be found that matches all requirements, that's OK, the dependency is kept in a package's node_modules folder instead of the top level and the Node.js resolution algorithm takes care of the rest. With PnP, I'm assuming the Node.js resolution algorithm is patched in a similar way to make it work with multiple versions of dependencies.

I'm assuming that because all of these dependencies are managed by a single yarn install the Yarn team opted to have a single lock file at the top-level where the workspace root is defined.

What's the problem?

In various monorepos, it is desirable to treat a workspace as an independent deployable entity. Most deployment solutions out there will look for manifest and lock files to set up required dependencies. In addition to this some tools, like Docker, can leverage the fact that versions are immutable to implement caching and reduce build and deployment times.

Here's the problem: because there is a single lock file at the top-level, one can't just take a package (i.e., workspace) and deploy it as one would when not using Yarn workspaces. If there was a lock file at the package level then this would not be an issue.


I've seen your comment about cache layers, but I wonder if what you're looking for isn't just a way to compute a "cache key" for a given workspace?

It's not just computing a "cache key" for caching, but also having a lock file to pin versions. For example, if you're deploying a workspace as a Google Cloud Function you would want the lock file to be there so that installation of dependencies was pinned as the lock file specifies. One could copy the entire lock file to pin versions but then the caching mechanism breaks. So the underlying thing we're working with here is that deployment platforms use lock files as a cache key for the third-party dependencies.

arcanis commented 4 years ago

Let's see if I understand this properly (consider that I don't have a lot of experience with Docker - I've played with docker-compose before, but there are many subtleties I'm still missing):

Did I understand correctly? If so, a few questions:

migueloller commented 4 years ago

Did I understand correctly?

I think so, but let me add a bit more context to how the layer caching mechanism works in Docker.

When building a docker image (i.e., docker build) a path to a folder is provided to Docker. This folder is called the build context. While the image builds, Docker can only access files from the build context.

In the Dockerfile, the specification to build the Docker image, there are various commands available. The most common one is COPY, it will copy files from the build context to the image's filesystem, excluding patterns from .dockerignore. This is where caching comes in. Every time a Docker command is ran, an image layer is created. These layers are identified by a hash of the filesystem's content. What this means is that usually, you'll have people have a Dockerfile somewhat like this:

# layer 1
FROM node
# layer 2
COPY package.json yarn.lock .
# layer 3
RUN yarn install
# layer 4
COPY . .

Again, this might be an oversimplification but it gets the point across. Because we first copy the package.json and the lockfile and then run yarn install. If next time we build this image, package.json and yarn.lockfile didn't change, only layer 4 has to be rebuilt. If nothing changed, then nothing gets rebuilt, of course. One could make the build context the entire monorepo but the lock file will still be changing a lot even if the dependencies of the package we're trying to build have not changed.

Workspaces typically cross-reference each others. For example, frontend and backend will both have a dependency on common. How does this fit in your vision? Even if you only deploy frontend, you'll still need common to be available as well.

This is a good question. After much thought, our solution is going to be a private NPM registry. This will not only work for building Docker images but also for using tools like GCP Cloud Functions or AWS Lambda. If Docker was the only tool we were using we could use the entire monorepo as the build context but still just COPY the dependencies and Docker layer caching will still work. This time, instead of the cache key being a single lock file, it would be the lock file of the package and all it's transitive dependencies that live in the monorepo. That's still not the entire repo's lock file. But since Docker isn't the only deployment platform we use that expects a yarn.lock to be there, this solution doesn't work for us.

Do you run yarn install before building the image, or from within the image? I'd have thought that the image was compiled with the install artifacts (by which point you don't really need to have the lockfile at all?), but maybe that's an incorrect assumption.

It's a best practice to do it within the image. This guarantees native dependencies are built in the appropriate OS and has the benefit of caching to reduce build times in CI. In our current workaround we actually have to build everything outside, move it into Docker, and run npm rebuild. It's very very hacky though and we're now at a point where the lack of caching is slowing us down a lot.

Specifically, what prevents you from copying the global lockfile into your workspace, then run a yarn install to prune the unused entries? You'd end up with a deterministic lockfile that would only change when the workspace dependencies actually change.

This might be a good workaround for now, perhaps in a postinstall script. Would this keep the hoisting benefits of Yarn workspaces?

migueloller commented 4 years ago

Specifically, what prevents you from copying the global lockfile into your workspace, then run a yarn install to prune the unused entries? You'd end up with a deterministic lockfile that would only change when the workspace dependencies actually change.

I tried this out with and unfortunately it's not as straightforward since running yarn install anywhere within the repository uses the workspaces feature.

migueloller commented 4 years ago

Linking a comment to a related issue here: https://github.com/yarnpkg/yarn/issues/4521#issuecomment-478255917

Larry1123 commented 4 years ago

If having an independent deployable entity is the main reason for this. I currently have a plugin that is able to do this. I need to work with my employer to get it released however. It's quite simple in how it does what it does and likely could be made better. It copies the lock and workspace into a new location edits out devdep from the workspace and runs a normal install. It keeps everything pinned where it was. It reuses the yarn cache, keeps the yarnrc, plugins, and yarn version installed in the output.

tabroughton commented 4 years ago

If having an independent deployable entity is the main reason for this. I currently have a plugin that is able to do this. I need to work with my employer to get it released however. It's quite simple in how it does what it does and likely could be made better. It copies the lock and workspace into a new location edits out devdep from the workspace and runs a normal install. It keeps everything pinned where it was. It reuses the yarn cache, keeps the yarnrc, plugins, and yarn version installed in the output.

@Larry1123 I think your plugin could be very useful to quite a few folks, will your employer allow you to share it?

Larry1123 commented 4 years ago

I got the ok be able to release it, will have to do it when I have the time.

samarpanda commented 4 years ago

@Larry1123 Wondering how are you handling yarn workspace. Does your plugin creates yarn.lock for each package in the workspace?

Larry1123 commented 4 years ago

In a way yes, it takes the project's lock and reruns the install of the workspace as if it was the only one in the project in a new folder after also removing devDependencies. That way the resulting lock matches the project but for only what is needed for that workspace. It also currently hardlinks the cache, and copies what it can keep from the project's .yarn files.

borekb commented 4 years ago

The backend + frontend + common scenario is a good one, we have something similar and it took me a while to realize that we sort of want two sets of workspaces. Let's say the repo looked like this:

.
├── common/
│   └── package.json
│
├── frontend/
│   └── package.json
│
├── backend/
│   └── package.json
│
├── package.json
└── yarn.lock

We're building two Docker images from it:

  1. frontend-app, where the Docker build context contains:
    • common/
    • frontend/
    • yarn.lock
  2. backend-app, where the Docker build context contains:
    • common/
    • backend/
    • yarn.lock

This can be done, and is nicely described in https://github.com/yarnpkg/yarn/issues/5428#issuecomment-403722271 (we furthermore utilize tarball context as a performance optimization), but the issue with a single lockfile stays: a change in frontend dependencies also affects the backend build.

(We also have other tooling that is affected by this, for example, we compute the versions of frontend-app and backend-app from Git revisions of the relevant paths, and a change to yarn.lock currently affect both apps.)

I don't know what the best solution would be, but one idea I had was that workspaces should actually be a two-dimensional construct in package.json, like this:

{
  "workspaces": {
    "frontend-app": ["frontend", "common"],
    "backend-app": ["backend", "common"]
  }
}

For the purposes of module resolution and installation, Yarn would still see this as three "flat" workspaces, frontend, backend and common, and the resulting node_modules structure (I don't know how PnP does this) would be identical to today, but Yarn would understand how these sets of workspaces are intended to be used together and it would maintain two additional files, yarn.frontend-app.lock and yarn.backend-app.lock (I'm not sure if the central yarn.lock would be necessary or not but that's a relative detail for this argument's sake).

When we'd be building a Docker image for frontend-app (or calculating a version number), we'd involve these files:

It would be awesome if this could work but I'm not sure if it's feasible...


As a side note, I previously thought that I wanted to have yarn.lock files in our workspaces, i.e., backend/yarn.lock and frontend/yarn.lock, but I now mostly agree with this comment:

I think the idea of Yarn 1.x monorepo is a little bit different. It isn't about independent projects under one roof, it is more about a singular big project having some of its components exposed (called workspaces).

In our case, frontend and backend workspaces are not standalone – they require common to work. The Yarn Workspaces is a great mechanism to link them together, to de-duplicated dependencies etc., we "just" need to have multiple sets of workspaces at Docker build time.

migueloller commented 4 years ago

I've changed where I stand on this issue and shared my thoughts here: https://github.com/yarnpkg/yarn/issues/5428#issuecomment-650329224.

borekb commented 3 years ago

@arcanis I'm reading your Yarn 2.1 blog post and there's a section on Focused Workspaces there. I don't have experience with this from either 2.x or 1.x Yarn but is it possibly solving the backend + frontend + common scenario & Docker builds?

Like, could I create a build context that contains the main yarn.lock file and then just packages/frontend + packages/common (omitting packages/backend), then focus the workspace on frontend and run the Docker build from there?

Or is it still not enough and something like named sets of workspaces would be necessary?

arcanis commented 3 years ago

I think it would, yes. The idea would be to run yarn workspaces focus inside frontend, which will install frontend+common, then to mount your whole repo inside the Docker image.

I encourage you to try it out and see whether there are blockers we can solve by improving this workflow. I'm not sold about this named workspace set idea, because I would prefer Yarn to deduce which workspaces are needed based on the main ones you want. It's too easy to make a mistake otherwise.

borekb commented 3 years ago

I'm not sold about this named workspace set idea, because I would prefer Yarn to deduce which workspaces are needed based on the main ones you want. It's too easy to make a mistake otherwise.

Agree; if the focus mode works, then it's probably better.

Do you have a suggestion on how to construct the common/frontned/backend dependencies to make it the most tricky for Yarn? Like, request some-dep@1.x from common, @2.x from frontend and @3.x from backend? The harder the scenario, the better 😄.

migueloller commented 3 years ago

I don't know if this makes a difference in your reasoning @arcanis but I thought it would be worth mentioning in case there's something about Yarn's design that would lend itself for this... this issue could also be solved by having a lockfile per worktree instead of per workspace. For example, each deployable workspace can itself be a worktree and specify which workspaces from the project it depends on.

Here's an example repo: https://github.com/migueloller/yarn-workspaces

It would be fine to have a lockfile for app1 and app2.

That being said, based on what I had commented before (https://github.com/yarnpkg/berry/issues/1223#issuecomment-650329692), one could just have multiple yarn projects in the same repo and have them all share the same Yarn cache. While it wouldn't be as nice as running yarn at the top of the repo if it were a single project, it will help with disk size if Yarn PnP is being used.

I'm taking the definition of project > worktree > workspace from here.

migueloller commented 3 years ago

Another thought is that yarn workspaces focus app1 could be called with an option so that it modified the top-level lockfile, perhaps this could be used to generate the "trimmed-down" lockfile for the Docker image.

I also wanted to add another use case in addition to Docker images. If one has a large monorepo where CI jobs are started depending on whether a certain "package" changed, having a shared lockfile makes that a bit hard for the same reasons it's hard on Docker's cache. If we want to check if a workspace changed, including its dependencies, we would also want to check the lockfile. For example, some security update could've been added that changed the patch version being used but not the version range in package.json. If the top-level lockfile is used, then the CI job would run for every change on any package. Having a single lockfile per workspace would alleviate this issue by simply using that lockfile instead of the top-level one.

borekb commented 3 years ago

If one has a large monorepo where CI jobs are started depending on whether a certain "package" changed, having a shared lockfile makes that a bit hard for the same reasons it's hard on Docker's cache.

That is a good point, and we have similar use case. Not just for CI but we also e.g. calculate the app versions ("apps" are e.g. frontend and backend) from their respective paths and "youngest" Git commits; a single shared yarn.lock makes this problematic.

gntract commented 3 years ago

yarn workspaces focus is a great command/plugin 👍

I'm currently using it within our Dockerfile - one question about determinism (which may expose my misunderstanding of yarn's install process:

Is it possible to run focus such that it should fail if the yarn.lock would be modified (i.e. --frozen-lockfile or --immutable but allow .pnp.js to be modified)?

arcanis commented 3 years ago

No (because the lockfile would effectively be pruned from extraneous entries, should it be persisted, so it wouldn't pass the immutable check) - I'd recommend to run the full yarn install --immutable as a CI validation step.

Larry1123 commented 3 years ago

@tabroughton @samarpanda I have gotten the plugin I was working on public https://gitlab.com/Larry1123/yarn-contrib/-/tree/master/packages/plugin-production-install. I hope it works for your needs.

andreialecu commented 3 years ago

I have a slightly different use case in mind for this feature. Originally wrote it on Discord, but copying it here for posterity:

One of the downsides of monorepos seems to be that once you add new developers, you have to give them access to the whole code base, while with single repos you could partition things in a way so they have access to smaller bits and pieces.

Now, this could probably be solved with git submodules, putting each workspace in its own git repo. Only certain trusted/senior devs could then have access to the root monorepo, and work with it as one.

The only problem holding this back seems to be the lack of a dedicated yarn.lock per workspace.

With a yarn.lock per workspace it seems that the following workflow would be possible:

1) Add new dev to team, give them access to only a limited set of workspaces (separate git repos) 2) They can run yarn install, and it would install any workspace dependencies from a private package repository (verdaccio, or github packages, private npm, etc) 3) They can just start developing on their own little part of the project, and commit changes to it in isolation. The top level monorepo root yarn.lock is not impacted. 4) CI can still be set up to test everything before merging

Seems like there would also be a need to isolate workspace dependencies to separate .yarn/cache in workspace subdirs if this approach was supported.

I'm not concerned about pushing, more concerned about pulling. I don't want any junior dev to simply pull all the company intellectual property as one simple command.

How do you guys partition projects with newer, junior (not yet established/trusted) devs, now that everyone works from home?

Larry1123 commented 3 years ago

This is something that has been a pain point for my work also. I have been wanting to work out a solution to this just have not had the time to truly work it out. Once I understand yarn better I had intended to try to work out a plan of action. A holistic approach I feel would have various integrations into things like identity providers, git, git(hub|lab)/bitbucket, yarn, and tooling or zero-trust coronation of internal dependencies and resolutions throughout the super repo. The integration into the git host would be to be able to handle cross project things but not sure what level it would need. I feel that a tool like this is sorely needed however hard to get right and time consuming to produce. Also I feel that a larger scope could be covered by creating something ment for cross organization cooperation as it would have open source use also then. It would likely take RFC style of drafting and planning to build. As current tool just doesn't support such workflows well. With how things go now my work tends to lean to don't trust new/Junior devs with wide access and if they work on a project it has to be in it's own scoped repos and projects.

andreialecu commented 3 years ago

I have created a pretty simple Yarn 2 plugin that will create a separate yarn.lock-workspace for each workspace in a monorepo:

https://github.com/andreialecu/yarn-plugin-workspace-lockfile

I haven't yet fully tested it, but it seems to create working lockfiles.

I would still recommend @Larry1123's plugin above for production deployment scenarios: https://github.com/yarnpkg/berry/issues/1223#issuecomment-705094984, but perhaps someone will find this useful as well.

jakebailey commented 3 years ago

I'll mirror my comment from https://github.com/yarnpkg/yarn/issues/5428#issuecomment-712481010 here:

My need for this behavior (versioning per workspace, but still have lockfiles in each package) is that I have a nested monorepo, where a subtree is exported to another repo entirely, so must remain independent. Right now I'm stuck with lerna/npm and some custom logic to attempt to even out versions. It would be nice if yarn could manage all of them at once, but leave the correct subset of the "entire workspace pinning" in each. (Though, I'm really not sure how this nested workspace will play out if I were to switch to berry, when berry needs to be committed to the repo, so needs to be committed twice?)

@andreialecu That plugin looks interesting; it's almost what I'm looking for, though appears to be directed towards deployment (and not just general development). But it does give me hope that what I'm looking for might be prototype-able in a plugin.

andreialecu commented 3 years ago

@jakebailey do note that there are two plugins:

for deployment: https://gitlab.com/Larry1123/yarn-contrib/-/tree/master/packages/plugin-production-install for development: https://github.com/andreialecu/yarn-plugin-workspace-lockfile

Feel free to take either of them and fork them. If you end up testing mine and improving it, feel free to contribute changes back as well.

DanielOrtel commented 3 years ago

@andreialecu your plugin looks pretty good and kind of what we're personally looking for, any chance you'd be willing to open a PR to the official workspace-tools package? It'd be nice if such a feature would be supported officially.

jakebailey commented 3 years ago

I haven't had time to try it out in my project (the above reminded me it exists, oops), but one thing about that plugin is that it appears to always write out to some special file, which then requires the dev to rename it if you are exporting the whole repo; it'd be nice if the plugin (and future support) didn't require that and just maintained regular yarn.lock files.

bertho-zero commented 3 years ago

I would also need this functionality with a public git submodule for a workspace, if someone is working on the public part they don't have a lockfile. I found this discussion which explains my situation well. I see that as an option like nmHoistingLimits, not sure if it's relevant.

I would also like my .yarnrc.yml to be in the public part and the root .yarnrc.yml to use it.

bertho-zero commented 3 years ago

https://github.com/bertho-zero/yarn-plugin-workspace-lockfile

I added two options to the plugin yarn-plugin-workspace-lockfile to choose the name of the lockfiles and allow filtering the worskpaces in which to create a lockfile:

edit: The plugin now uses the resolutions of the super project well.

jakebailey commented 3 years ago

@bertho-zero I finally got around to testing your version of the plugin out; it appears as though it creates a lockfile per package, where the lockfile locks the dependencies needed for each package. I think that's useful for some users, but unfortunately what I'm looking for is a little different.

I want to produce a lockfile that describes each entire workspace (packages that contain workspaces, nested or not), just as though you had copied the entire nested workspace out into another folder and run yarn to get a yarn.lock. That lockfile would look sort of like the top level lockfile, in that it locks itself and the packages beneath it. In effect, it's like asking yarn to stop going upward to find the outermost workspace. That way, the nested workspace could feasibly be copied out to another repo (or, vendored into another repo) and still have a maintained lock file.

I'm not super familar with yarn's APIs to know if this is possible, though. I may try my hand at it when I have the time.

jakebailey commented 3 years ago

Actually, https://github.com/milesforks/yarn-plugin-workspace-lockfile appears to be what I'm looking for, though requires a small modification to fix an assumption they're making. Neat.

EDIT: Fixed the bug in https://github.com/jakebailey/yarn-plugin-workspace-lockfile.

bertho-zero commented 3 years ago

What is the difference between workspace and package? The plugin creates a lock file per workspace, filterable on an option.

I want to produce a lockfile that describes each entire workspace (packages that contain workspaces, nested or not), just as though you had copied the entire nested workspace out into another folder and run yarn to get a yarn.lock.

This is exactly the fix I made.

The problem you are having is due to the way yarn resolves the root project, for him a workspace is root as soon as he sees a package.json next to a yarn.lock.

To bypass this problem you have to tell Yarn which lockfile to use.

lockfileFilename: ${OA_PUBLIC_LOCKFILE:-yarn.lock-workspace}

$OA_PUBLIC_LOCKFILE is yarn.lock-workspace by default in order to works outside the main monorepo, and yarn.lock in the root project in order to always use the yarn.lock of the root project.

This is tested in the root project, in the submodule outside of the root project and in CI. (https://github.com/OpenAgenda/oa-public)

bertho-zero commented 3 years ago

You will have the exact same problem with the fork of the plugin:

Developers can then clone the repository they need to work on, and either rename yarn.lock-workspace to yarn.lock before installing, or they can create a .yarnrc.yml file that contains lockfileFilename: yarn.lock-workspace.

My solution has the advantage of creating the right lockfile, with the same versions as in the superproject. It works from the superproject AND the submodule without renaming or modifying any file. All you needed was to trick the yarnrc with an environment variable.

jakebailey commented 3 years ago

What is the difference between workspace and package? The plugin creates a lock file per workspace, filterable on an option.

I guess I should have been more consistent in my wording; it's annoying that the package.json has a field called "workspaces", versus calling the entire collection a "workspace" (which is how I have been referring to it, and maybe that's the wrong terminology and this specific issue is not what I want). I have two workspaces, one nested in another, that contain many packages (two of which are the workspace roots themselves and have deps for editor tooling); I want one lockfile per workspace, not per package. When I tried your fork, I got one lockfile per package. Filtering it to say "only put lockfiles in these places" to make it only make them for the root packages left me with lockfiles which didn't capture all of the other packages, just the root package dependencies themselves.

The problem you are having is due to the way yarn resolves the root project, for him a workspace is root as soon as he sees a package.json next to a yarn.lock.

This to me seems like a flaw. My fork does work, but I did hit more issues after my fix that were related to yarn.lock where things like yarn tsc or yarn prettier didn't work (and I was working on a repro to report it as a bug).

$OA_PUBLIC_LOCKFILE is yarn.lock-workspace by default in order to works outside the main monorepo, and yarn.lock in the root project in order to always use the yarn.lock of the root project.

Where are you setting this variable? For me, it seems a bit gross to need to be setting anything; if anything I'd want the public repo to look "normal", not the internal one, hence trying to make the nested repo have a yarn.lock (plus all of the tooling that assumes yarn.lock to be the place to look for deps, like compliance scanners and such).

I'll give it another go when I have time, but I'd rather file a bug about yarn stopping when it sees yarn.lock (and how to work around that) than anything else.

For reference, https://github.com/jakebailey/yarn-berry-failure is a test repo that I have nearly working, except the root finding problem you mentioned (showing when I try and run things).

bertho-zero commented 3 years ago

I understand the difference between package and workspace. That's why I added the option but maybe it's better if by default it looks at the package.json to only create lockfiles in workspaces only.

The environment variable has a default value to work without any changes for those who develop in the public part.

Internally I use direnv with an .envrc (which is in the .gitignore) or I put the variable in my .zshrc for the superproject. I don't need to set it for CI testing or deployment since the commands are run from root and it uses the correct yarn.lock.

jakebailey commented 3 years ago

Yeah, in my case, all devs who work on this codebase are on Windows (except me, sometimes), so it's a bit of a deal breaker to require some sort of environment variable setup to make core functionality work. That, and I'm still really not wanting the public repo to have a non-standard lockfile. The public repo shouldn't have to hint at the fact that it's actually some subset of another repo.

At this point, I'm going to try and see if there's a way to trick yarn into not stopping at the first yarn.lock it finds when in a nested workspace. Reading through the code, it seems like nested workspaces support plugins from parent workspaces (where it uses the closest first, then adds the next closest, and so on), so it must be at least aware that it's in a nested workspace at any given point. But, I'm largely new to this version of yarn (only really looked once I got this prototype working), so it's a bit of a stumbling match.

jakebailey commented 3 years ago

I hacked together a change to add an option that changes the project search algorithm; this allows me to optionally allow nested projects to continue looking upward for the "true" root even if they have a lockfile: https://github.com/jakebailey/berry/commit/1e67d41c261169db2ee5715f08bd26647fa02086

This combined with the plugin to write out the lockfile per nested root gets me what I need, though I do believe that the original issue really wanted a lockfile for every single package in the whole tree (I don't necessarily want that). It's not nice that the nested dep needs to be aware of the fact that it could be nested, but...

(This is probably not the best way to do this, but it solves my problem enough to continue testing to see if I can replace my awful npm/lerna hoist-less setup; I'm sure someone knows better how to whip this into shape or what fundamental flaws this has. I'm sure there's the "what if you vendor a dep that itself uses this" that breaks this.)

EDIT: I have an even eviler change which simply hot-patches @yarnpkg/core from a plugin at startup to get this behavior (rather, if the parent directory of the nested dir contains a file named .yarnnested, then it keeps looking up), so no yarn modification needed. I apologize in advance for that... 🙂

EDIT2: All of the above combined works great, and I'm likely to switch my project over to the yarn 3 RC + the two plugins. My public repo has a plugin that won't be used when not nested, so otherwise appears as though it's a regular project with a lock file. Then, when nested, the plugin to export yarn.lock handles ensuring the nested project is kept correct.

danoc commented 3 years ago

Turborepo added an experimental prune command to fix this issue. Their release notes do a good job of explaining why this is a problem for monorepos that use Yarn workspaces and Docker. (See the "Experimental: Pruned Workspaces" section of their release notes.)

Also, I noticed that pnpm has a shared-workspace-lockfile setting which, from what I can tell, would make it possible to avoid this issue. They do point out that turning off the shared lockfile has a few dowsides and I'm not sure what the context is behind this setting. It may be unrelated to Docker.

borekb commented 3 years ago

yarn-plugin-entrypoint-lockfiles

In our company repo, we're now experimenting with yarn-plugin-entrypoint-lockfiles which produces one lockfile per "entrypoint" (we couldn't come up with a better name yet 😄). It is an implementation of this idea above.

It looks like this:

.
├── packages/
│   ├── lib-a/
│   ├── lib-b/
│   ├── lib-c/
│   ├── app1/
│   └── app2/
├── yarn.lock
├── yarn.app1.lock  👈 added by the plugin
├── yarn.app2.lock  👈 added by the plugin
└── package.json

Each "entrypoint lockfile" is a subset of the main yarn.lock file, only focusing on the dependency graph of that particular entrypoint.

Entrypoints are defined in package.json like this:

{
  "name": "demo-of-entrypoint-lockfiles",
  "private": true,
  "workspaces": {
    "packages": [
      "packages/*"
    ],
    "entrypoints": [
      "packages/app1",
      "packages/app2"
      // could also be a pattern like "packages/app*"
    ]
  }
}

When we're building e.g. app1, we don't worry about the main yarn.lock file but only about yarn.app1.lock. For example, we add a specific lockfile into the Docker build context:

ADD package.json yarn.app1.lock etc...

And then build the app with

YARN_LOCKFILE_FILENAME=yarn.app1.lock

So far, it seems to work well. There are downsides like more lockfiles in the repo and slightly slower yarn add and other lockfile-manipulating commands but overall, the benefits outweigh the cons for us. For example, CI jobs or Docker builds aren't started if the change isn't related to a particular entrypoint. We like it.

Kudos to @JanVoracek for implementing the plugin and kudos to the Yarn team for providing such a nice plugin API (the source code is quite short).

0x53A commented 2 years ago

+1 and I'd like to describe my use cases:

We have a monorepo. Our CI script has support for "incremental builds" for performance reasons, by just comparing the current git tree to the git tree of the last green build. When only one project has changed files, only that project (and projects depending on it) are compiled.

This obviously does not work with a single lock file, because a change in one web project will trigger a recompile of all web projects.

I think https://github.com/andreialecu/yarn-plugin-workspace-lockfile will solve this issue.

In my differ, I can ignore the root yarn.lock and the individual yarn.lock-workspace files are never actually used, only as trigger for incremental rebuild.

cmbirk commented 2 years ago

A slightly different use case here - GCP Cloud Functions Deployments looks for a yarn.lock file in the specified workspace and doesn't traverse up the tree to find the lockfile at the root. If it doesn't find a yarn.lock file in the workspace it falls back to npm install which breaks with yarn workspaces. If I can generate a yarn.lock file in each workspace it would inform their build tools to use yarn instead of npm and provide the necessary dependencies for that workspace.

You can see a conversation with the google devs here: https://issuetracker.google.com/issues/213632942

cmbirk commented 2 years ago

I realize my use case is very different than most, but the issue referenced above was closed and they requested I open a feature request. If a few people from this community wouldn't mind weighing in on the issue I created I would be very grateful!

KurtGokhan commented 2 years ago

Let me state our use case as well. Instead of a monorepo, we prefer to set up a repo with git submodules. The root repo acts as the workspace root and submodules are projects in that workspace. This way, each project can have their own set of issues, branches, PRs, CI, and branch protection rules. This method provides us the best of both worlds.

It would be perfect if we could have yarn.lock for each workspace instead of only one at the root repo. So each project could act standalone (perfect for CI), but root project could also act as a monorepo/workspace if needed (perfect for development).

haf commented 2 years ago

There are some concerns when putting together a solution to this:

  1. You can't COPY . . because that'll be a huge amount of stuff in a monorepo (blog posts online show this as a "solution")
  2. You have to resolve, pack and copy each dependent workspace to the one that you build (resolving workspace:* links, etc) (so it needs to be a plugin or a native command of some sort)
  3. You have to get both devDependencies and dependencies to the first stage builder in the Dockerfile, that is used to compile e.g. TypeScript (or what have you)
  4. The Dockerfile must be in the service / lib / subfolder / workspace folder (because that's how monorepos work)
  5. The invocation to docker must be from the root so that the right yarn.lock file can be copied and managed
  6. The creation of the intermediate storage folder (for the bundles/packs should not be recreated, or otherwise this busts the docker caching mechanimsm)
  7. In dev, TypeScript must have references: [ { path: '../packages/otherlib' } ] setting in order to resolve the types well and this conflicts with packing up dependencies as node modules during build time (see step 2)

Unfortunately yarn.build crashes https://github.com/ojkelly/yarn.build/issues/187 when trying to do a bundle command (so this is not a solution), and docker-build fails with by not following point 3 above #22 and point 6 #23.

More references:

Could the powers that be please add a fully working workspace-enabled typescript recipe here? https://yarnpkg.com/getting-started/recipes — because it's not a "five minute fix" to look this up (8 hours and counting here...)

ojkelly commented 2 years ago

In dev, TypeScript must have references: [ { path: '../packages/otherlib' } ] setting in order to resolve the types well and this conflicts with packing up dependencies as node modules during build time (see step 2)

Just on this point, I've found this approach flakey and no required, when you can leverage yarn and typescript. We need to treat them like they're independent packages coming from NPM, so each workspace needs a build command that emits types, and a package.json#main. Then you need to build them all.

But once done, you don't need to use workarounds in tsconfig.json. This is one of the design goals I have with yarn.build — monorepo build tooling that lets us leverage the standard conventions of the ecosystem.


On the rest of you points, it sounds like this is about the point of scale where a feature from Bazel becomes useful. They allow for you to reference/pull in artefacts from other Bazel repositories.

If there was an option for a lock file per workspace, that would functionally be similar to multiple repos. While it's possible to to use ignore files for Docker/yarn.build's bundle/and others, it's not practical to ignore the dependencies cached or vendored by yarn on a per workspace setting.


Related to the questions above, I recently added --changes, --since ${COMMIT_HASH} and --since-branch main to yarn.build.

I use yarn build -c deploy --changes in CI to run package.json#scripts.deploy for every workspace with a git change in the last commit.

And yarn build --since-branch main (and test) to build PR's so that every commit in the PR thats different from main is included in the check of which workspaces to build.


Re the typescript example: I realise this is a very contrived example, but https://github.com/ojkelly/yarn.build/tree/main/packages/examples is a set of typescript packages that have dependencies on each other, and import with types.

eric-burel commented 2 years ago

Hey folks, I wanted to add some use case and repro to this issue: we have packages and starters app in our monorepo.

Here is the full monorepo: https://github.com/VulcanJS/vulcan-npm, using Yarn 3 workspaces.

See https://github.com/remix-run/remix/issues/683#issuecomment-1130176988 as well This is similar to what @KurtGokhan described earlier

With @borekb approach, we can probably setup a palliative like so:

However the postinstall will be called when I yarn install the monorepo as well, so I need an additional check to tell whether I am in the monorepo or not, eg using some env variable.

I feel like https://github.com/JanVoracek/yarn-plugin-entrypoint-lockfiles and https://github.com/varsis/generate-lockfile could deserve some renewed attention, they could help closing this issue.

eric-burel commented 2 years ago

Hi folks, I gave a shot at generate-lockfile but it couldn't read the root yarn.lock: https://github.com/varsis/generate-lockfile/issues/4#issuecomment-1177248679

Will try yarn-plugin-entrypoints-lockfiles.

Edit: this seems to work much better, see my example monorepo: https://github.com/VulcanJS/vulcan-npm/pull/132/files. I've written a README for this package as well: https://github.com/JanVoracek/yarn-plugin-entrypoint-lockfiles/pull/2

Last issue: I hit error Your lockfile needs to be updated, but yarn was run with--frozen-lockfilein CI. The yarn.lock seems not totally up to date, the diff between theyarn.lockupdated afteryarn, and what theyarn.vulcan-remix.lock` outputed automatically by the plugin is like this:

diff yarn.lock yarn.vulcan-remix.lock 
1663,1665c1663,1665
< "@types/react-dom@npm:<18.0.0, @types/react-dom@npm:^17.0.14":
<   version: 17.0.17
<   resolution: "@types/react-dom@npm:17.0.17"
---
> "@types/react-dom@npm:^17.0.16":
>   version: 17.0.16
>   resolution: "@types/react-dom@npm:17.0.16"
1668c1668
<   checksum: 23caf98aa03e968811560f92a2c8f451694253ebe16b670929b24eaf0e7fa62ba549abe9db0ac028a9d8a9086acd6ab9c6c773f163fa21224845edbc00ba6232
---
>   checksum: 2f41a45ef955c8f68a7bcd22343715f15e1560a5e5ba941568b3c970d9151f78fe0975ecf4df7f691339af546555e0f23fa423a0a5bcd7ea4dd4f9c245509936
1672,1674c1672,1674
< "@types/react@npm:^17, @types/react@npm:^17.0.43":
<   version: 17.0.47
<   resolution: "@types/react@npm:17.0.47"
---
> "@types/react@npm:^17.0.16":
>   version: 17.0.44
>   resolution: "@types/react@npm:17.0.44"
1679c1679
<   checksum: 2e7fe0eb630cb77da03b6da308c58728c01b38e878118e9ff5cd8045181c8d4f32dc936e328f46a62cadb56e1fe4c5a911b5113584f93a99e1f35df7f059246b
---
>   checksum: ebee02778ca08f954c316dc907802264e0121c87b8fa2e7e0156ab0ef2a1b0a09d968c016a3600ec4c9a17dc09b4274f292d9b15a1a5369bb7e4072def82808f
5949,5952c5949,5952
< "graphql@npm:^16.3.0, graphql@npm:^16.4.0":
<   version: 16.5.0
<   resolution: "graphql@npm:16.5.0"
<   checksum: a82a926d085818934d04fdf303a269af170e79de943678bd2726370a96194f9454ade9d6d76c2de69afbd7b9f0b4f8061619baecbbddbe82125860e675ac219e
---
> "graphql@npm:^15.6.2":
>   version: 15.8.0
>   resolution: "graphql@npm:15.8.0"
>   checksum: 423325271db8858428641b9aca01699283d1fe5b40ef6d4ac622569ecca927019fce8196208b91dd1d8eb8114f00263fe661d241d0eb40c10e5bfd650f86ec5e
11725c11725
< "vulcan-remix@workspace:.":
---
> "vulcan-remix@workspace:starters/remix":
11727c11727
<   resolution: "vulcan-remix@workspace:."
---
>   resolution: "vulcan-remix@workspace:starters/remix"

To fix this I have to drop frozen-lockfile in my CI during yarn install but this is a bad practice.

Also @borekb : YARN_LOCKFILE_FILENAME is not documented anywhere, is that custom to your setup? For now I just rename the file in my CI to yarn.lock after copying to the right place.

eric-burel commented 1 year ago

Hi, just to rephrase what I think is needed now to close this issue: 1) we need a way to run yarn on a custom file, like YARN_LOCKFILE_FILENAME=yarn.remix.lock yarn 2) we need a way to run yarn that generates the lockfile, but not node_modules (or whatever solution used for modules).

The idea is that you could "trick" yarn into generating a lockfile, but without actually installing packages. Since this lockfile is NOT named yarn.lock, it won't break package hoisting for workspaces when you do a normal yarn in the monorepo root.

The process could be as follow:

It could even be simplified like this:

Maybe those options kinda exist today? But I couldn't find anything like that in the docs.