yarnpkg / yarn

The 1.x line is frozen - features and bugfixes now happen on https://github.com/yarnpkg/berry

https://classic.yarnpkg.com

Other

41.41k stars 2.72k forks source link

yarn.lock with workspaces and git subtree #5434

Open bebbi opened 6 years ago

bebbi commented 6 years ago

Feature

What is the current behavior?

Workspaces: One yarn.lock per project root

What is the expected behavior?

To address specific collaboration needs, some workspaces in my project are git subtree based sub-repos. They need to be developed and tested both separately and from within the monorepo. Ideally, there would be one yarn.lock at project root and one per workspace - or is there a good recommendation to follow?

Related

4521

5428

connectdotz commented 6 years ago

based on yarn blog:

When you publish a package that contains a yarn.lock, any user of that library will not be affected by it. When you install dependencies in your application or library, only your own yarn.lock file is respected. Lockfiles within your dependencies will be ignored.

it doesn't seem necessary to bundle yarn.lock when publishing the individual packages... it is more a development artifact for the whole repo.

some workspaces in my project are git subtree based sub-repos. They need to be developed and tested both separately and from within the monorepo.

Maybe the question is not yarn.lock file per se, but if you really want a monorepo... If these packages are not so closely related that they evolve independently anyway, maybe yarn workspaces, at least for now, is not the best way to set them up?

bebbi commented 6 years ago

@connectdotz Yes, it's about dev, not publishing.

It's a typical monorepo situation - shared app and API. Just that an external collaborator works on the core logic of one single module. I believe it is a pretty common situation.

connectdotz commented 6 years ago

I see. I actually had an exact situation you described: I was working on a vscode-jest PR that needed to change jest-editor-support package in jest workspaces. As an external collaborator, I just clone the whole jest workspaces...

One of the fundamental assumptions of monorepo is that these packages preferred to be developed together. The fact that you and others mentioned the need to sometimes develop these packages independently (thus package-level yarn.lock) contradicts this fundamental assumption. I am sure the need is legit, maybe the package is complex and large, maybe it's for logistic or security reason... But the solution is worth debating, should we abandon workspaces fundamental assumption or it is better off to think about a "composable" workspaces, i.e. nested workspaces. I have a feeling the package-level yarn.lock might not be the end of the story, the exception packages might soon be asking for separate security model, across git repositories, or that it is too large and complex and wanted to break down into smaller packages (i.e. it's own workspace)...

But before we go down the rabbit hole, I think it is also worth comparing what if we just move these "exception" packages into their own repo as independent packages. I would think you already have lots of logistic issues allowing these monorepo packages to be worked on independently, would split them out reduce these issues? what exactly is the functionality lost and can they be helped with something like yarn link?

bebbi commented 6 years ago

In my case, the choice between monorepo or not is driven by the cost (complexity and risk) of each option. Working around by adding yarn.lock to .gitignore in the sub-repo seems like perhaps the lowest penalty, certainly cheaper than breaking the monorepo based development flow.

This, and the existence of a number of established efforts to bridge container repos vs sub-repos may point at this request being worthwhile to consider: git-submodules, git-subtree, git-subrepo, splitsh, etc.

The request could perhaps be restated as "not breaking git-submodules/git-subtree workflows on workspaces", and I believe then it's not a rabbit hole because these tools can provide a pragmatic boundary of what is to be supported and what not (e.g. security is out of scope).

Also, as we already have a node-modules per workspace, a yarn.lock addition wouldn't seem too odd to me and it seems to match up with some of the other referenced issues.

Perhaps it would be useful if others who are into monorepos and git submodules/subtrees can chime in as well.

connectdotz commented 6 years ago

The request could perhaps be restated as "not breaking git-submodules/git-subtree workflows on workspaces"

Hmm... I think I start to see what you are asking...

git-submodules, git-subtree, and monorepo are 3 different approaches for the same problem: help developing larger/multi-package projects. Each approach has its pros/cons and different sweet spot. I think monorepo exists partly because of the complexity and pitfalls of git-submodule and git-subtree... people are looking for simpler and more cohesive solutions. However, monorepo, while simpler than the others in some aspect, is certainly not a panacea, it has many pitfalls as well. Choice is painful... I can understand why you are trying to mix workspaces with subtree...

Nevertheless, while generating a yarn.lock for each workspace might not seem a lot, there will be new issues raised, such as:

when package-level yarn.lock is conflicting with repo-level yarn.lock, what should happen?
if a subtree can be developed outside of the monorepo, how do they reference other monorepo packages? how do we guarantee the yarn.lock there is actually in sync with the monorepo's?

As mentioned above, it also breaks the monorepo fundamental assumption that these packages should be developed as a whole while capable of being published independently. Not that yarn workspaces can't extend its definition or scope, it is ultimately up to the core team to decide. IMHO, generating a package-level yarn.lock is both insufficient in resolving workspaces + subtree, nor advancing its core mission today. There are other suggestions on how to address the known monorepo limit, such as the one mentioned above, you may find those helpful for your situation as well.

For people who are looking to publish/deploy individual packages, such as to docker, AWS lambda, let's consolidate the discussion in #4521

bebbi commented 6 years ago

when package-level yarn.lock is conflicting with repo-level yarn.lock, what should happen? if a subtree can be developed outside of the monorepo, how do they reference other monorepo packages? how do we guarantee the yarn.lock there is actually in sync with the monorepo’s?

Dependencies work well for us, but I don’t have a proposal for conflicts.

For dependencies, the subtree's dependencies are resolved by simply publishing relevant dependencies from the monorepo. Up to the monorepo dev to ensure relevant modules get published. For conflicts, I don’t know how a subtree yarn.lock change would get synced back in, I don’t know enough to chime in.

Thanks for the discussion, Happy to have this issue pursued or closed per you preference.

alexgorbatchev commented 2 years ago

I want to add a few points to this discussion.

Having a mechanism for merging subtree or dependency level yarn.lock into monorepo yarn.lock is extremely useful. Here's an example. A team updates moduleA's dependencies to latest and introduces a CVE via a transitive dependency. moduleA is used by 50 monorepos at the company. When these 50 monorepos receive latest version of moduleA, there are now 50 VULN tickets filed and each monorepo has to be fixed individually.

What about the dependabot you may ask? Well, as a big company, we have our internal Artifactory, which dependabot can't see, and so we can't use it.

An imaginary easy fix is for team who owns moduleA to update yarn.lock to use latest transitive dependency and allow this change to propagate to all the monorepos where it will be "magically" merged with the main yarn.lock.

I'm not suggesting this merge should happen automatically. This could be part of yarn commands and conflicts could be resolved via user prompt. Obviously this would need a lot more thought put into it.

At the core, the problem here is that resolving CVEs often can't happen at the source of the problem, or even in the middle. The fix can't effectively be distributed and so the same work ends up being done over and over again. This is more or less a big company problem and probably doesn't apply to individual developers that much.