nodejs / corepack

Zero-runtime-dependency package acting as bridge between Node projects and their package managers
MIT License
2.62k stars 172 forks source link

Corepack should not append hash to version #316

Open rotu opened 1 year ago

rotu commented 1 year ago

corepack use and corepack up add build metadata to the version, resulting in a semver string that doesn't match any released version.

e.g. "packageManager": "npm@10.2.0+sha256.c362077587b1e782e5aef3dcf85826399ae552ad66b760e2585c4ac11102243f"

The actual version is simply "10.2.0".

This causes problems for tools that expect to find an existing version: https://github.com/moonrepo/proto/issues/231

Additionally, corepack does not tolerate existing build metadata:

e.g. "packageManager": "npm@9.7.0+some-custom-build"

corepack install
Adding npm@9.7.0+some-custom-build to the cache...
Internal Error: Digest method not supported

Related: https://github.com/npm/cli/issues/1479

aduh95 commented 1 year ago

Why shouldn't Corepack append hash to the version? I can see you link to the original PR introducing the functionality, you have probably read that it was security concerns that led to that decision, what are your thoughts on that?

Additionally, corepack does not tolerate existing build metadata

If that's a problem for someone, PRs welcome, but I doubt that there's a use case for specifying metadata except a hash.

rotu commented 1 year ago

Why shouldn't Corepack append hash to the version? I can see you link to the original PR introducing the functionality, you have probably read that it was security concerns that led to that decision, what are your thoughts on that?

There are a couple of reasons:

  1. It makes the packageManager field no longer human-friendly to edit, since it now has a human-readable aspect and a machine-generated aspect.
  2. The version is at the discretion of the author of the package; this hash is an artifact of this tool, not part of the package version.
  3. The hash is not a property of the code; it's a property of the tarball. It depends not just on the version but on the "resolved" url from package-lock.json (which has no analog in corepack and will be a problem for private repositories).

Yes, I think it makes sense to hash the package manager. It doesn't make sense to put it in the version string.

aduh95 commented 1 year ago

Then don’t push the hash / edit it out, no? If you want human editable, it seems only faire that you have to edit it manually, and it’s completely optional. But I have a hard time seeing why you would not chose to give up the security of the hash. Or do you have a suggestion to store the hash somewhere else?

rotu commented 1 year ago

The hash could be stored in package.json in a different field. I'm thinking "peerDependenciesMeta" is appropriate. e.g.:

"peerDependenciesMeta": {
  "pnpm": {
    "resolved": "https://registry.npmjs.org/pnpm/-/pnpm-8.9.0.tgz",
    "integrity": "sha512-74hZk44fBTe5/PAwkEQxE5Lzs4s0QXbmzU/e4hsiVSSwrCobCK4q4t3Vs/9LjKSW1neOlQ8+fJ9VW4EyWYJEHA=="
  }
}

An "integrity" field is the approach adopted by npm and by node in analogy with the Subresource Integrity W3C spec.

arcanis commented 1 year ago

It doesn't really make sense in peerDependencies, as the package manager is not a peer dependency, and even if it was peer dependencies are resolved dynamically based on their ancestors, so it wouldn't make sense for them to have "resolved" or "integrity" fields.

I understand your point that, strictly speaking, a second field could perhaps have been a good idea at the time, but it seems extremely low impact (if it even has any?) to change that now compared to the potential to break tools that already expect Corepack to work this way.

Also worth noting that nothing in the semver spec forbids using the build metadata the way Corepack does, and that it's as far as I know impossible to publish versions on the npm registry w/ build metadata (https://github.com/yarnpkg/berry/issues/5785#issuecomment-1758600086).

rotu commented 1 year ago

Peer dependencies are packages which a package expects to be present on the host system but necessarily installed by the package manager. That describes build tools and package managers to a tee. It would also be perfectly fine to put this in a new field like “packageManagerLock”.

I’m not sure what you mean by “peer dependencies are resolved dynamically based on their ancestors”.

What other tool expect the packageManager field to include build metadata? I suspect it’s only proto, which merely strips it out and therefore wouldn’t break if it were relocated.

The semver spec is written with the tacit assumption that it is the package author defining the version!

I believe that npm does tolerate build metadata in package.json#version; it just doesn’t use it in resolving which package to download, and therefore won’t tolerate it in the tarball’s filename.

aduh95 commented 1 year ago

The semver spec is written with the tacit assumption that it is the package author defining the version!

I think it's fair to say that's your interpretation, that's not how I read the spec.

I believe that npm does tolerate build metadata in package.json#version; it just doesn’t use it in resolving which package to download, and therefore won’t tolerate it in the tarball’s filename.

Here's what they say in their docs: The "version" field must be in the form x.x.x and follow the semantic versioning guidelines. In practice, I'm not sure they enforce it, at least it doesn't complain when running npm pack on a package that has metadata in its "version", and generate a tarball with the metadata in its filename.

Anyway, to come back to the topic on hand, I agree with @arcanis that changing the syntax now is a non-zero cost that seems hard to justify unless the current syntax breaks something. If someone opens a PR to move the hash to somewhere else, we would probably still want to support the current syntax.

rotu commented 1 year ago

Yes, npm ignores part of the package version. That's probably a mistake in retrospect. I'm glad they support prerelease identifiers even though those linked pages don't mention it!

Using it as "free real estate" in corepack not only relies on npm's decision here, but precludes package managers from using build metadata, since it would break corepack!


Say I'm super-paranoid and download and build the package manager myself.

In my peerDependencies, I'd have:

"npm": "github:npm/cli#57a895781dda201956c645be3e8bb8ac93eade1f"

And the integrity fields would look something like:

"version": "10.2.0",
"resolved": "git+ssh://git@github.com/npm/cli.git#57a895781dda201956c645be3e8bb8ac93eade1f",
"integrity": "sha512-yRjLG88QeGJqtBorxfHq/92pxlOJhPOd2Xm+sHkmygMoKgQbe6rnHotejEtGMvrWMS8YaBE5hNiGS75a2e3/hQ=="

By contrast, corepack's current design of associating a hash directly with the version number precludes this. You can't link the package manager back to source code, because it's tied specifically to a tarball (presumably the one on npm's public repo).


Changing the syntax now is a non-zero cost, but it is going to get more expensive the further down the line we go. Whenever this is done, it should probably also use sha512 and base-64 encode the result so it can be visually compared with the integrity field in the npm Registry.

I'm starting to think that maybe the packageManager field should have just the brand (e.g. npm/yarn/pnpm/bun) and that the versioning constraint belongs in peerDependencies.

aduh95 commented 1 year ago

Using it as "free real estate" in corepack not only relies on npm's decision here, but precludes package managers from using build metadata, since it would break corepack!

I don’t see how we “rely on npm decision” here, we rely on the semver spec (which arguably is a limitation, but one that makes sense IMO). Quoting the spec: “Build metadata MUST be ignored when determining version precedence. Thus two versions that differ only in the build metadata, have the same precedence”. I think there’s a misunderstanding on what means “build metadata” in the semver spec, my interpretation of the spec is that their use in Corepack makes perfect sense.

corepack's current design of associating a hash directly with the version number precludes this. You can't link the package manager back to source code, because it's tied specifically to a tarball

Sure you can, you may need to specify the hash of the source code tarball in a different field, but there’s no reason Corepack compat would get in your way. If you build the package manager in the Corepack’s cache, you could even use Corepack with your custom build.

it should probably also use sha512

FYI Corepack already supports all the hash algorithms supported by Node.js, including SHA-512.


Anyway, I feel it’s not productive to keep discussing, everyone has made their point. If someone opens a PR to propose changes, we can discuss the details.

rotu commented 1 year ago

I don’t see how we “rely on npm decision” here, you rely on the semver spec (which arguably is a limitation, but one that makes sense IMO). Quoting the spec: “Build metadata MUST be ignored when determining version precedence. Thus two versions that differ only in the build metadata, have the same precedence”.

That's determining version precedence, not subsumption, and I think it's the core of our misunderstanding. If the user specifies 1.0.1, then npm does not give them 1.0.2, even though it has higher precedence.

It's npm's decision to ignore the build metadata entirely, not just ignore it for precedence purposes.


Another confusing consequence of the current setup is that, passing corepack use the value of an existing packageManager field (1) doesn't check the passed hash (2) renders corepack confused:

$ corepack use "npm@10.2.0+sha256.c362077587b1e782e5aef3dcf85826399ae552ad66b760e2585c4ac11102243f"
Installing npm@10.2.0+sha256.c362077587b1e782e5aef3dcf85826399ae552ad66b760e2585c4ac11102243f in the project...
$ corepack up
Usage Error: Invalid package manager specification in package.json; expected a semver version
$ npm pkg get packageManager
"npm@10.2.0+sha256.c362077587b1e782e5aef3dcf85826399ae552ad66b760e2585c4ac11102243f+sha256.c362077587b1e782e5aef3dcf85826399ae552ad66b760e2585c4ac11102243f"
arcanis commented 1 year ago

That's determining version precedence, not subsumption, and I think it's the core of our misunderstanding

I don't think there's a misunderstanding; we understand you believe your interpretation is the right one, but we believe the spec doesn't make this clear, and that your interpretation is what it is - just one possible interpretation.

And interestingly it doesn't even seem to be supported by the semver package, as running semver.valid on a version with build metadata simply strips the metadata. You perhaps can argue that the semver package isn't the semver spec, but at this point it's really splitting hairs.

ziyuang commented 6 months ago

By the way is it possible to specify whether to use SHA256 or SHA512? corepack under Ubuntu attaches SHA512 hash while it attaches SHA256 hash under Windows.