openjs-foundation / package-metadata-interoperability-collab-space

The goal of the Package Metadata Interoperability Collab Space is to improve how JavaScript developers define their packages across the ecosystem. The group is currently working to better understand package.json and how developers use it to define their projects.
27 stars 4 forks source link

Define spec for development environment field #15

Closed GeoffreyBooth closed 1 month ago

GeoffreyBooth commented 9 months ago

This is a draft spec for https://github.com/nodejs/node/issues/51888#issuecomment-1967102442, to define the runtime and package manager for developing a project or package (not consuming/installing it as a dependency in another project). We could use the name devEngines and it would be a new top-level field defined with this schema:

interface DevEngines {
  os?: DevEngineDependency | DevEngineDependency[];
  cpu?: DevEngineDependency | DevEngineDependency[];
  libc?: DevEngineDependency | DevEngineDependency[];
  runtime?: DevEngineDependency | DevEngineDependency[];
  packageManager?: DevEngineDependency | DevEngineDependency[];
}

interface DevEngineDependency {
  name: string;
  version?: string;
  onFail?: 'ignore' | 'warn' | 'error' | 'download';
}

The os, cpu, libc, runtime, and packageManager sub-fields could either be an object or an array of objects, if the user wants to define multiple acceptable OSes, CPUs, C compilers, runtimes or package managers. The first acceptable option would be used, and onFail would be triggered for the last defined option if none validate. If unspecified, onFail defaults to error for the non-array notation; or it defaults to error for the last element in the array and ignore for all prior elements, for the array notation. Validation would check the name and version ranges.

The name field would be a string, corresponding to different sources depending on the parent field:

The version field syntax would match that defined for engines.node, so something like ">= 16.0.0 < 22" or ">= 20". If unspecified, any version matches.

The onFail field defines what should happen if validation fails:

In the event of onFail: 'download', it would be the responsibility of the tool to determine what and how to download, perhaps by looking in the tool’s associated lockfile for a specific version and integrity hash. It could also be supported on a case-by-case basis, like perhaps Yarn and pnpm could support downloading a satisfactory version while npm would error.

Typical example:

"devEngines": {
  "runtime": {
    "name": "node",
    "version": ">= 20.0.0",
    "onFail": "error"
  },
  "packageManager": {
    "name": "yarn",
    "version": "3.2.3",
    "onFail": "download"
  }
}

“Uses every possible field” example:

"devEngines": {
  "os": {
    "name": "darwin",
    "version": ">= 23.0.0"
  },
  "cpu": [
    {
      "name": "arm"
    }, {
      "name": "x86"
    }
  ],
  "libc": {
    "name": "glibc"
  },
  "runtime": [
    {
      "name": "bun",
      "version": ">= 1.0.0",
      "onFail": "ignore"
    },
    {
      "name": "node",
      "version": ">= 20.0.0",
      "onFail": "error"
    },
  ],
  "packageManager": [
    {
      "name": "bun",
      "version": ">= 1.0.0",
      "onFail": "ignore"
    },
    {
      "name": "yarn",
      "version": "3.2.3",
      "onFail": "download"
    }
  ]
}

Some potential future expansions of this spec that have been discussed are:

ljharb commented 9 months ago

Can you just comment the proposal here? This repo only exists to document what already exists; it doesn't make sense to make a PR until that's the case.

wesleytodd commented 9 months ago

I agree and think we should write this proposal up as prose first. I think we set the initial scope to be "documentation only" as a way to take off a good split task that was not huge scope. Technically this repo is not even for that documentation, this one is: https://github.com/openjs-foundation/package-json-research

GeoffreyBooth commented 9 months ago

Okay, how about something like this. We could use the name devEngines and it would be defined with this schema:

interface DevEngines {
  runtime?: DevEngineDependency | DevEngineDependency[];
  packageManager?: DevEngineDependency | DevEngineDependency[];
}

interface DevEngineDependency {
  name: string;
  version?: string;
  os?: string[];
  cpu?: string[];
  onFail?: 'ignore' | 'warn' | 'error' | 'download';
  download?: {
    url: string;
    hash: string;
  }
}

Both runtime and packageManager sub-fields could either be an object or an array of objects, if the user wants to define multiple acceptable runtimes or package managers. The first acceptable option would be used, and onFail would be triggered for the last defined option if none validate. Validation would check the version ranges, OS and CPU constraints.

The runtime.name field would be a string from the list of WinterCG Runtime Keys. The packageManager.name field . . . I don’t know, we could define the list as part of the spec, or perhaps it would need to correspond to an npm registry package name. Suggestions welcome.

The version field would match the definition of the value of engines.node, so something like ">= 16.0.0 < 22" or ">= 20". If unspecified, any version matches.

The os and cpu fields would match their correspondents in https://docs.npmjs.com/cli/v10/configuring-npm/package-json#os and https://docs.npmjs.com/cli/v10/configuring-npm/package-json#cpu. If unspecified, any OS or CPU matches.

The download field would apply only for onFail: 'download'. I assume no one would support for runtime. It could also be supported on a case-by-case basis, like perhaps Yarn and pnpm could support downloading a satisfactory version while npm would error.

I think this covers all the ideas from https://github.com/nodejs/node/issues/51888#issuecomment-1967102442 and https://github.com/nodejs/node/issues/51888#issuecomment-1967141092. cc @wraithgar

wraithgar commented 9 months ago

onFail is a user directive, not a maintainer directive. It would be unwise to let the maintainers dictate that behavior. A good historic example of why this is not advisable is engines itself. There are many packages out there that specify an engines that looks something like node: "^8.0.0". The maintainer likely had every intention of keeping that up to date but now that package doesn't work in node 20. If they also had been able to specify "fail" then the package would stop working, even if it works fine. Because packages are idempotent this can't be fixed. The end user is ultimately the only one who can make the decision on whether or not this constraint is a blocker.

GeoffreyBooth commented 9 months ago

onFail is a user directive, not a maintainer directive.

But in this case, the user is the maintainer. This whole configuration block only applies when it’s the developer of the package/project.

wraithgar commented 9 months ago

How does download even work with a semver range? If these are all npm registry packages then there is already a well defined discovery mechanism for downloading a given named package at a given version.

I'd love to hear from yarn or pnpm if and how they'd plan on handling the failure state where the user has indicated they want to download the correct version. How do you pick the version? Do you spawn it yourself? Why not fail and let the user remediate?

wraithgar commented 9 months ago

But in this case, the user is the maintainer

Good point. As you can see the lines between "installing as module" and "interacting as app" are so blurry that even I missed that.

GeoffreyBooth commented 9 months ago

How does download even work with a semver range?

If a URL is provided, we’re just downloading that exact URL. The version constraint wouldn’t apply.

If we want to allow onFail: 'download' and a missing download field, the only behavior I can think of would be to query the npm registry for the requested package manager and download the newest version that fits within the range.

wesleytodd commented 9 months ago

Yeah to be clear all of this is for the maintainer. We should include that this proposal would not effect any install behavior other than for the top level project.

GeoffreyBooth commented 9 months ago

I moved my draft into the top post to make it easier to find.

ljharb commented 9 months ago

This is a great start. I think it's important that it not just be limited to the runtime and the package manager - process.versions has a lot of things in it that would be really nice to also constrain in various cases, especially openssl which has lots of CVEs.

styfle commented 9 months ago

Tagging relevant people to review the draft spec: @Ethan-Arrowood @arcanis @zkochan @aduh95

wraithgar commented 9 months ago

I am not very good at reading specs like this. What would it look like in implementation to limit the local dev environment to a given os or cpu for example? From what I see it doesn't look like that field is sufficiently decoupled from "named" which doesn't make sense in that context.

aduh95 commented 9 months ago

packageManager is already used in the wild as a string, we'd need to define what happens when the value is a string to allow a smooth transition – another option is to use a different key, but packageManager is quite good a describing what it is.

wesleytodd commented 9 months ago

but packageManager is quite good a describing what it is .... is already used in the wild as a string

This proposal covers way more than package manager and would not be under that field name (although it is not mentioned above). I think that should have always been the goal, so that is why keeping the field "experimental" in node core was important. I agree we would want to smooth the path. But I think that is up to the folks who implemented the field in their tools. Maybe here we would just discuss goals for ways to make that transition?

ljharb commented 9 months ago

The first step should be figuring out (such that they can be clearly documented and conveyed) the problems we're solving, and the goals and non goals - and then to sketch out a solution that seems viable to all stakeholders.

At that point, those interested in the migration path from the experimental packageManager field to the new thing can explore the best way to achieve that - but until the solution design is settled I'm not sure how any progress could be made on migration.

Ethan-Arrowood commented 9 months ago

Cross-posting from Slack:

I’ve attempted to read all of the various GitHub and Twitter threads that have led us here. I think there is a lot going on in general, and I don’t have a clear, concise reply just yet. For now, all I will say is that I encourage all invested parties to join the monthly call on March 5th. We rarely have much to discuss regarding the group’s active work streams, so we can reasonably spend the entire hour discussing the corepack issue.

For those looking for the upcoming meeting invite, please copy the invite from the OpenJS Calendar

GeoffreyBooth commented 9 months ago

What would it look like in implementation to limit the local dev environment to a given os or cpu for example?

I added those fields because you had mentioned them on the other thread. I'm not sure what the use case would be; run npm on Windows and Bun on macOS? I can just remove them if you can't think of a use case either. We can always add fields later.

ljharb commented 9 months ago

I think that os/cpu etc are in the category of "general engines worth explicitly giving a range for", and altho "runtime" and "package manager" are reasonable to call special, the schema needs a mostly free-for-all dumping ground too (like engines, import.meta, etc are all now) for things like this.

wraithgar commented 9 months ago

Yeah, remember we're defining a spec for all development runtime constraints, not just package manager. Package manager is a subset of constraints. Node version, os, cpu, and libc are other constraints.

zkochan commented 9 months ago

Why would you allow specifying multiple different package managers? I assume there are still people in this group that are strongly against lockfiles. But at least don't motivate devs to use different package managers on a project. It will make everyone's life a lot harder.

wesleytodd commented 9 months ago

Why would you allow specifying multiple different package managers?

My guess is this is done to remain flexible, not make a recommendation that folks should. That said, I could think of a few cases where maybe you want to use pnpm for local dev but npm for publish or something of that nature?

I assume there are still people in this group that are strongly against lockfiles.

I don't think that would be a driving part of the decision is it? I assume you are meaning that if a package declared multiple dev package managers they would then have to resolve the lockfile differences? I think at this layer of abstraction (a spec for the package.json) we would not want to exclude package managers from agreeing on a shared lockfile format (in fact we might want to encourage that 😎) so I think it is reasonable to keep.

styfle commented 9 months ago

How about changing “hash” into two fields: “algorithm” and “digest”?

GeoffreyBooth commented 9 months ago

How about changing “hash” into two fields: “algorithm” and “digest”?

That’s fine, what would be an example? This?

"download": {
  "url": "https://registry.npmjs.org/@yarnpkg/cli-dist/-/cli-dist-3.2.3.tgz",
  "algorithm": "sha224",
  "digest": "16a0797d1710d1fb7ec40ab5c3801b68370a612a9b66ba117ad9924b"
}
wesleytodd commented 9 months ago

algorithm and digest should also be optional (but encouraged for the implementing tools to add/update them).

wraithgar commented 9 months ago

What's the "hash" currently? In npm lockfiles it's the output of ssri.stringify which includes both algorithm and digest

wesleytodd commented 9 months ago

For folks who need it (me) here is what ssri references: https://www.npmjs.com/package/ssri

EDIT: I should have added, this is an implementation of the subresource integrity spec used in browsers as well: https://w3c.github.io/webappsec-subresource-integrity/

So with that in mind, I would say that seems like it could be a good spec to just directly here.

GeoffreyBooth commented 9 months ago

My example came from https://github.com/nodejs/corepack#when-authoring-packages. Per https://github.com/zkat/ssri/blob/latest/README.md#--ssriparsesri-opts—integrity, you can run ssri.parse(sri) where sri is either a combined string like sha224.953c8233f7a92884eee2de69a1b92d1f2ec1655e66d08071ba9a02fa or an object like { algorithm: 'sha224', digest: '953c8233f7a92884eee2de69a1b92d1f2ec1655e66d08071ba9a02fa' }

wesleytodd commented 9 months ago

Yeah I would propose we directly commit to supporting exactly what this ssri package implements. So if there is a call out we can reference great, if not we can just make the original proposal reflect that it supports both formats.

GeoffreyBooth commented 9 months ago

I think this is probably close enough that PRs could be started to implement this? I think one each for npm and Corepack, and the implementation process should shake out any remaining questions. And if other package managers want to read the field directly too, similar to how npm will, they can always add support.

For npm, there’s https://github.com/npm/cli/pull/7253 which I think could be updated to support the above schema?

@styfle, do you think this is ready for Corepack to implement? You’d have to decide what it should do when both this field and packageManager are present (like either have the new one take precedence, or maybe error only when they have download definitions that mismatch). Is this something you want to take on?

wraithgar commented 9 months ago

I think we are confusing fields a little too much still. Things that are relevant to packageManager are usually only relevant in that context. The other things generally don't affect that one.

How about something that builds off of engines, wraps the os etc back up into it, and gives packageManager a proper namespace instead of trying to put them in the root object indexed by names (like we do now with engines.npm)?

interface DevEngines {
  os?: string[];
  cpu?: string[];
  libc?: string[];
  node? string;
  packageManager?: DevEngineDependency | DevEngineDependency[];
}

interface DevEngineDependency {
  name: string;
  version?: string;
  onFail?: 'ignore' | 'warn' | 'error' | 'download';
  download?: {
    url: string;
    algorithm?: string;
    digest?: string;
  }
}

The root os, libc, etc and engines would be (eventually) only evaluated at install time.

GeoffreyBooth commented 9 months ago

@wraithgar Re the new proposal, I don’t think it’s an improvement over what we have for a few reasons:

ljharb commented 9 months ago

It doesn't seem like a good idea to allow for arbitrary downloads from "not an npm registry". It's fine if an error message includes a URL but automatic downloads are going to raise a lot of security eyebrows (as they already do on corepack and npx).

wesleytodd commented 9 months ago

I agree with all that @GeoffreyBooth posted.

It doesn't seem like a good idea to allow for arbitrary downloads from "not an npm registry".

I don't think we should dictate this and focus on the form and not the content/implementation. If a tool wants to support other things, fine. FWIW, I could see a really strong argument for companies downloading from an assortment of internal systems.

GeoffreyBooth commented 9 months ago

I could see a really strong argument for companies downloading from an assortment of internal systems.

I also don’t anticipate npm supporting onFail: 'download' at all, so it’s not relevant for the npm implementation, which should be the first one.

ljharb commented 9 months ago

@wesleytodd would the internal systems being "artifactory or similar" really be a barrier?

wesleytodd commented 9 months ago

would the internal systems being "artifactory or similar" really be a barrier?

No, but I also don't see value in that restriction at this level. I think saying the contract is a tarball at a url then leave it to the tool to decide what it supports doing with that tarball. Which is really all that the registry api is providing anyway. I mainly just think this should be left up to tools implementing against this and not the proposal itself.

wesleytodd commented 9 months ago

Actually, maybe even the contract is not "a tarball at a location" and is more "a string with meaning the tool understands". Which is even lower bar for what we "define as the spec".

EDIT: Probably an early send on that. My thinking for that was if a runtime manager wanted to support npm registry specs then they should be able to do that while keeping the same api contract in package.json. Or am I missing something on this idea?

ljharb commented 9 months ago

I totally agree that this spec shouldn't mandate anything but a string, for the reasons indicated - i'm suggesting that perhaps in v1 it shouldn't even have an implication that it's a URL - even though an implementation can do whatever they want with "a string".

wraithgar commented 9 months ago

The package metadata interop group meets March 5th I believe. This should be discussed there, and after that the npm team can work on an implementation. Likely using the existing devEngines PR as a reference point.

wraithgar commented 9 months ago

I still don't understand how "os" is any less of a "runtime" directive than "package manager" but it's not an issue I'm willing to push back THAT hard on. I can remain confused if I'm the only one confused.

wraithgar commented 9 months ago

Also, as someone who has had to write the code to parse engines, os, etc I would humbly ask that we go ahead and bake in that these fields that can be arrays are always arrays. It's a lot easier to parse.

GeoffreyBooth commented 9 months ago

Also, as someone who has had to write the code to parse engines, os, etc I would humbly ask that we go ahead and bake in that these fields that can be arrays are always arrays. It’s a lot easier to parse.

That’s already the case for os and cpu. I think for the top-level runtimes and packageManager fields we really should support either an object or an array of objects. I think the vast majority of projects will want to define a single development runtime and a single development package manager, so we should add a quick Array.isArray check to allow them to do so.

GeoffreyBooth commented 8 months ago

@zkochan Does pnpm have any thoughts on the proposed new package.json field described in the top comment? Does this seem like something that, if npm added support for, pnpm would do the same? Are there any concerns about the spec or missing functionality that you would want added?

zkochan commented 8 months ago

I prefer the packageManager field. I don't think this new field is needed. My only problem with packageManager is that it only allows exact versions.

In any case, I know I don't have influence in these decisions. If npm will use this field, pnpm will probably support packageManager and the new field too.

aduh95 commented 8 months ago

FWIW I see no reason why the two fields could not cohabitate (if not indefinitely, at least for a while). As far as Corepack is concerned, it will likely defer to the package managers the parsing of devEngines and turn off its "strict mode" by default (the feature that emits an error if someone tries to run e.g. pnpm i on a project that has a packageManager field referencing a different package manager). Just my personal opinion of course.

GeoffreyBooth commented 8 months ago

In any case, I know I don’t have influence in these decisions. If npm will use this field, pnpm will probably support packageManager and the new field too.

We’re inviting you to have influence 😄 This is the place.

npm also wants version ranges. While we could expand packageManager to allow defining ranges, the npm team wants some other features such as specifying whether an out-of-range validation should error or warn. This is a bit much to squeeze into a string value, hence the proposal for a new field that takes an object. The idea is that this field could contain everything that all consumers could want; npm wouldn’t implement the download parts, and others could choose to ignore the warning aspect if they always want to error, etc. The new field would be more flexible to handle additional use cases that will inevitably come up.

wesleytodd commented 8 months ago

Yeah you have a lot of input and value in this conversation! That said, I would like to probe on why you prefer the former? Do you dislike the idea of this being used for node version specification as well? If so I am interested in your feedback on that.

darcyclarke commented 8 months ago

Is there a shorthand we could align on? It seems like we're on the right path with the primitives but I doubt anyone would write these by hand (especially download or onFail) as they'd be prone to error/misconfiguration. I threw together a quick example of how I imagine this but it would be good to get a sanity check:

{
  "devEngines": {
    "cpu": "ia32",
    "os": "linux",
    "libc": "glibc",
    "runtime": "node@>10",
    "packageManager": "pnpm@latest"
  }
}

The above assumes that runtime & packageManager values could reference any valid spec which can be parsed by something like npm's npm-package-arg & names would be associated with the configured registry by default (just like dependencies). This would mean that node & bun would be supported out the gate (because there are distributions for those in the widely used npm public registry today) but you could still define any url or alias for runtime or packageManager as a source/download.

Also, do I understand it correctly that all of these fields can/should be Arrays(?). I think that means that runtime & packageManager could be a String, Array or Object (based on the schema in the top comment). I guess this would be helpful for scenarios like the one I mentioned here & folks may want to use more then one "package manager" in their projects or support more then one "runtime" for testing 🤷🏼‍♂️.

{
  "devEngines": {
    "cpu": ["ia32"],
    "os": ["linux"],
    "libc": ["glibc"],
    "runtime": ["node@20", "bun@1"],
    "packageManager": ["pnpm@latest", "npm@latest"]
  }
}

The most likely of all scenarios though is that devs will just define a single runtime & packageManager (no different then how they do with engines.node or engines.npm today) so that should be as simple & straightforward as possible...

{
  "devEngines": {
    "runtime": "node@20",
    "packageManager": "npm@10"
  }
}
Ethan-Arrowood commented 8 months ago

I'm +1 for arrays by default. Not a huge issue IMO to make even for the simple case:

{
  "devEngines": {
    "runtime": ["node@20"],
    "packageManager": ["npm@10"]
  }
}