[Feature] Allow vendor to control dependency versions

rix0rrr commented 2 years ago

[ ] I'd be willing to implement this feature (contributing guide)
[ ] This feature is important to have in this repository; a contrib plugin wouldn't do

Describe the user story

We are the vendors of a CLI tool that touches people's AWS infrastructure. Since the tool will by design be run in situations where it has a lot of access to our user's AWS accounts, there are some concerns around this tool being a target for supply chain attacks.

The NPM ecosystem commonly has incidents like these, and it's only a matter of time before it happens to us:

We'd therefore like to have tight control of our tool's dependencies. Preferably, we want to ensure that when a user installs our tool, they get a "known good" version of the tool and all of its dependencies. However, that requires that we can control the point versions of every dependency in our dependency closure.

Yes, we can control the version strings we use to depend on our dependencies: we can avoid using ^ or ~ or >= in our package.json. But we cannot control the package.jsons used by our dependencies, and their dependencies, and it's an unfortunate truth that nearly everyone uses ^ everywhere, which means any dependency in our closure is potentially an attack vector.

You can say yarn.lock solves this issue, but it doesn't really. yarn.lock only applies to existing projects. It doesn't apply to new projects, nor does it apply to global installations. Tomorrow, the owner of any package in our dependency closure can release a minor point upgrade with malicious code in it and every users that initializes a new project or uses yarn to install the CLI globally after that is immediately affected.

I would also add that relying on yarn.lock to solve this issue punts it from someone with a lot of information (the person who wrote and vends the CLI) to someone with very little information (the consumer of the CLI, who should be able to treat the abstraction provided as a black box and shouldn't need to know about the transitive dependency tree hidden behind it). Case in point: it doesn't seem right that we have to rely on 1000s of our users having the knowledge that colors 4.1.2 became unusable and they should all revert their dependency to 4.1.0 and take care not to upgrade it again in the future... especially given that the alternative is—or at least, should be—that we fix this automatically for them.

To mitigate this problem for our users, we need to be able to control the dependency closure that our CLI tool runs against.

For users using NPM, we can do this today (using the npm-shrinkwrap.json mechanism), but we have no mechanism to protect our Yarn users in the same way. And I really don't know what to tell them, except to stop using Yarn which I'd rather not do.

Describe the solution you'd like

I have seen one some website that it is impossible for you to consume the npm-shrinkwrap.json file directly. I'm not asking for that. I'm asking for a comparable mechanism. It's fine if it requires a new file with other data in it. If the mechanism exists, we can invest time to produce an appropriate configuration file.

We just need a way for a package vendor to control versions in a subsection of the dependency tree, in whatever way possible.

We need every user of Yarn out there to be protected out of the gate, in every situation where they might install this tool, so this cannot go into a plugin.

Describe the drawbacks of your solution

I imagine it might be hard to define how this interacts with the yarn.lock of the environment into which packages are being installed, or how it interacts with resolutions. To a first approximation, that might be resolved by saying that a yarn-shrinkwrap.lock (or whatever) file needs to be complete, and no dependencies may be requested in that subtree that are not covered by the shrinkwrap file. Let's say you initially don't support resolutions overrides either.

I suppose the feature might also be abused by people who don't understand its purpose, might reduce chances of package deduplication, and might complicate the implementation of PnP in ways I don't completely oversee.

I'm still not sure any of the "it will be hard" arguments are reasons not to do it, given what's at stake.

Describe alternatives you've considered

If worse comes to worst, I suppose we can always choose to vendor-in all our dependencies, bloating our module size and potentially creating a bit of licensing/copyright hassle we'll have to sort through. Other than that, I really can't think of other alternatives.

I would hate to have to do that though, which is why I'm here asking for you to reconsider first.

arcanis commented 2 years ago

The technical challenge isn't the main hurdle, although it is certainly one. My main concern is that this goes against the current ecosystem. If every library starts forcing dependencies, it's hard to ascertain how bad things will get in terms of project size. Some maintainers already spread nonsense like "don't use lockfiles" or "only ship esm", you can be sure there would be a bunch of "always pin all your dependencies" blog posts that would suddenly appear, and it's hard to evaluate the medium- and long-term effects.

Out of curiosity, did you make an actual evaluation of the size typical projects nowadays would have, if they all their dependencies were fully pinned (and thus almost each package was duplicated N times)?

Overall my thinking would be to recommend you to use bundling which, even though I understand you don't like it that much, is a portable solution that works now and provide various extra benefits (like the download size being smaller, and having just enough friction that users won't make it the default way they ship their packages, which would likely be disastrous).

rix0rrr commented 2 years ago

you can be sure there would be a bunch of "always pin all your dependencies" blog posts that would suddenly appear, and it's hard to evaluate the medium- and long-term effects.

It might not be as bad as all that. Fortunately there is already some prior art we can look at: NPM supports this feature today, and it's definitely not like every project ships with a shrinkwrap file, nor is there a plethora of blog posts arguing that people should start shipping them. There was some confusion initially, but that has all pretty well settled down by now.

Overall my thinking would be to recommend you to use bundling

I thought you might say something like that.

This is not just about us: to distribute dependable CLIs this is a must-have feature imo, for the reasons above. To be honest, I'm not quite sure why other CLI vendors aren't more concerned about this... but I suppose we'll have to wait for a large-scale attack to hit first to get this onto people's radars.

Until then, I suppose we'll have go the homebrew route.

BigForNothing commented 1 year ago

@arcanis How are we supposed to have reproducible builds? We can, as long as we stay within the monorepo. As soon as we get to multiple repos, it stops happening.

Given a package published without a npm-shrinkwrap.json (or even a yarn.lock), every time I install it, I might get different dependencies included, right? I can specify exact versions in the package.json, but those packages will still install modules in accordance with the versions specified, including carrots and ranges.

With npm, an included npm-shrinkwrap.json would prevent that from happening. Every single time, it would be exactly the same. Ideally yarn can publish and consume npm-shrinkwrap.json.
In our case, we have multiple monorepos. So while a yarn.lock works great within a monorepo, it doesn't help for a published package.
I'm all for this being a flag option or a .yarnrc.yml option, and not being default. Even a plugin supporting it would be fine. There are valid reasons why this is needed, even if it isn't needed for everyone.
I have to agree with @rix0rrr. Everything he has stated is spot on.

yarnpkg / berry

[Feature] Allow vendor to control dependency versions #3968