Open appsforartists opened 1 year ago
It's come to my attention that scanning for and squatting on private packages in public registries is being publicized as an attack called dependency confusion.
gitpkg+yarn users are particularly vulnerable due to this issue.
Imagine that again that some-project
depends on package-a
and package-b
with git URLs, and that package-a
depends on package-b
via workspace:
. If package-b
isn't published on NPM, this works today, as the root instance of package-b
fulfills package-a
's import {} from 'package-b'
.
However, at any point, an attacker could discover that package-b
is used privately and publish an evil package-b
to NPM. All of a sudden, unbeknownst to the private users, the evil clone is being downloaded alongside package-a
, because there is no way for the publisher to control how workspace:
schemas are resolved by yarn pack
.
Would you accept a PR implementing protocolFormat
(as described above) to resolve this?
the evil clone is being downloaded alongside
package-a
Why? If package-a
uses workspace:
, then Yarn will use whatever the project provides, and not download it from anywhere.
If package-a
has workspace:
in its package.json
, yarn pack
will resolve that to a specific version number and mint that number into the tarball.
Therefore:
some-project
(which is not in the workspace) specifies the git
URLs for package-a
and package-b
, everything will work as expected. some-project
sets resolutions
for those packages, everything will work as expected. package-a
has a transitive dependency on package-b
and (resolutions
isn't set in some-project
), whether-or-not everything works depends on if the name package-b
is used by a package in NPM.
package-b
isn't published in NPM, some-project
's copy will be shared with package-a
.package-b
is published on NPM (and yarn
is run), yarn
will prefer the public version. Hence, dependency collision: some-project
is importing project-a
and project-b
over git
, but project-a
is also downloading a copy from NPM.project-b
is published after some-project
is created. All of a sudden, users might find themselves downloading the evil clone of project-b
that didn't exist when the dependency was first introduced.The proposed resolution (teaching yarn pack
how to write the correct version schema for private dependencies) ensures that the public package is never incorporated, even if the containing project doesn't set resolutions
.
Describe the user story
We don't have an internal npm server; rather, we use a git remote as our private packages host. This leads to dependencies like this:
In this case,
some-project
is a Yarn-managed monorepo containing packages likepackage-a
andpackage-b
.package-a
depends onpackage-b
using theworkspace:
protocol.Packages are published by the excellent
gitpkg
, which runsyarn pack
for each package and pushes the result as a git tag.yarn pack
resolves theworkspace:
protocol to the version in the dependency'spackage.json
. Thus, yarn is currently assuming the packed files will be distributed with an npm server.When someone installs a package from git and runs
yarn
, yarn will check if a package with that name exists on npm. If so, it will try to use it. This can yield hard to debug errors likewhere
package-b
is a transitive dependency that has been packed with the wrong version.Describe the solution you'd like
A field in package.json that can be interpolated by the
pack
plugin. It might be calledprotocolFormat
.(There are all sorts of permutations of
pack
,version
,dependency
,protocol
,remote
, andformat
that might yield a key name. I've avoiding bikeshedding on that here.)package.json
When
pack
encounters a manifest with this field, it uses it to replaceworkspace:
. The value will be copied verbatim, with some simple string substitutions:$VERSION
package.version
$PACKAGE_NAME
package.name
The most important one is
$VERSION
.$PACKAGE_NAME
is nice because it lets the sameprotocolFormat
be copy/pasted in many packages within a project.Describe the drawbacks of your solution
Uses a key in the package.json namespace.
Whatever key we pick would have to only be used for this purpose. If another tool in the JS ecosystem picked a similar name for a different feature, the tools would be in conflict.
Is a niche usecase in the core codebase
Since using a git remote as a package repository is not a mainstream practice, one could argue that supporting it is outside the scope of the yarn organization.
The footprint of this change is small (adding an
if
condition with a simple string replacement). It is unlikely to need much maintenance in the future.The assumption that anyone who uses
workspace:
expects their packages to be distributed to an npm server is a faulty one inconsistent with the rest of the project. If packages were only meant to be published to npm, there wouldn't be a top-level Protocols tab in the docs. This change corrects that assumption.Simply, its value outweighs its cost.
Allows the dependency version to be changed via user input
One can imagine an attack whereby someone adds
to a package.json, changing how a dependent is packed.
While supply chain attacks shouldn't be dismissed, the truth is that if someone has enough access to a codebase to change what's in a sibling package's manifest, the codebase has already been totally compromised regardless of this feature.
Describe alternatives you've considered
Do nothing. Force users to use
resolutions
.The
resolutions
escape hatch is a workaround for this issue.There are two problems:
Transitive dependents are responsible for
resolutions
.package-a
can't programatically do anything to ensure thatdependent-project
has the correctresolutions
field, but will be inevitably blamed when installation fails without one.It puts version management for private packages in a different place (
resolutions
) than for public ones (dependencies
). It's likely that someone will forget to keep them in sync, letting one contain stale data. It's also a source for confusion.Support
${ arbitrary js expression }
within the formatNice because it looks like a template string, but adds too much complexity.
It's hard to format a JavaScript expression to fit in a JSON string. It also poses too many questions about which values will be in scope and how they will be sandboxed.
Support any key from
package.json
One alternative to
${}
would be to disallow any expression, but support any path in a package.json.Chosing a syntax could be difficult; anything other than
${}
would need to be learned.${}
affords that it supports arbitrary whitespace and expressions, which would likely not be the case here.More broadly, it's unclear what keys beyond
name
andversion
might be useful. Starting with the valuable keys doesn't preclude us from adding more powerful formatting later, but does avoid yakshaving to include potentially-unneeded details in an already niche feature.A plugin
Since
pack
is a plugin, its functionality could be replaced with a plugin.A fork is ugly. People and tools would need to know to use the fork instead of the original.
A sandwich plugin is also an option - trying to handle
workspace
beforepack
does, and then remember to restoreworkspace
afterpack
finishes.This solution is needlessly complex and fragile. What happens if the process is aborted before the sandwich has cleaned itself up? How do we ensure that the hooks run in the right order?