Open arcanis opened 3 years ago
cc @zkochan @ruyadorno
how will the artifacts actually be bundled with the package? All the different artifacts will be in the same package?
No, you'd publish one package per variant, each synchronized version-wise with the main one. So given the first example, the following package would exist on the registry:
prisma
that would be the package anyone would useprisma-prebuilt-win32-napi4
prisma-prebuilt-win32-napi6
prisma-prebuilt-darwin-napi4
prisma-prebuilt-darwin-napi6
This is similar to what already happens. For example, esbuild-linux-xxx
.
This seems dangerous and unreliable.
How so? Do you have an example of case where this wouldn't fly? (I can see one I didn't described in the RFC, the tooling need to somehow publish those variants; I'll address it in the OP, but that should be solvable)
Well, how can you be sure that all those package names are not already occupied in the registry? Also, I can imagine a scenario where someone will publish a malicious variant, and somehow the package manager will download it.
edit: if this will only work with scoped packages, then I have no objections
Well, how can you be sure that all those package names are not already occupied in the registry? Also, I can imagine a scenario where someone will publish a malicious variant, and somehow the package manager will download it. If this will only work with scoped packages, then I have no objections
That's a very good point - yep, requiring the pattern
field to have a scope seems a safer requirement. Will update accordingly.
@arcanis This looks great, thanks for notifying me about it. I'll make some time to give it a deeper review, but off the top of my head:
platform:linux
may need to differentiate between libc implementations, e.g. platform:linux
(glibc assumed), platform:linuxmusl
- Node.js doesn't expose this hence detect-libc.arch:arm
and arch:arm64
may need to differentiate between versions from process.config.variables.arm_version
, e.g. arch:armv6
, arch:armv7
, arch:arm64v8
, arch:arm64v9
.platform:darwin arch:arm64v8
and platform:linux arch:x64
for local M1 macOS and remote AWS Lambda using the same "installation"?We discussed a few possible approaches to this in npm using overrides, but several approaches like this one ran into our hard constraint that the dependency graph resolution must be deterministic for a given set of dependencies, regardless of platform.
If prisma
has different dependencies from prisma-win32-x64
which in turn has different dependencies from prisma-linux-ia32
, then we're in real trouble. Two installations at the same time with the same package.json
will result in different package-lock.json
files, which is a recipe for disaster.
We are sensitive to this use case however, and intend to propose an alternative rfc for the npm cli in the next few days.
If
prisma
has different dependencies fromprisma-win32-x64
which in turn has different dependencies fromprisma-linux-ia32
, then we're in real trouble. Two installations at the same time with the samepackage.json
will result in differentpackage-lock.json
files, which is a recipe for disaster.
This is covered in "Cache integration" (although it doesn't explicitly mentions the lockfile at the moment). Users would be allowed to mention the platforms they want to "overfetch", and they would all be stored in the same lockfile.
I hit a use case for this at my work and remembered the gist from a while back. I wrote an RFC, erroneously thinking https://github.com/yarnpkg/rfcs was still active, without checking here first (whoops!), I've reproduced the relevant parts below with updates from discussions on discord, and from information in this RFC.
For some context we make a hardware focused user interface library that utilises the Electron runtime, with React Native and Web targets on the way. We use several native modules such as node-serialport
and node-usb
, and have developed others in-house for things such as an embedded time series database. We'd like to open source some of these native modules but the publishing story is quite fractured and we'd like to improve that story. Our usage of OSS native modules requires the use of 4 separate 'standards' of native module distribution.
We have native modules that can't be distributed via WASM, native modules are required. We would like to provide pre-built artifacts instead of requiring compilation on the end user systems.
A unified method of native module publishing with the following high level goals is desired:
Native artifacts are stored in the registry, not out of band in GitHub Releases, S3, etc.
Package and download only the minimum artifacts for any given operating system / platform.
Statically link the native modules.
Fetching these artifacts is the responsibility of the package manager.
Zero-install support
Storage of artifacts in the registry results in fewer points of failure.
GitHub isn't a particularly good CDN.
We've had outright failures to install due to GitHub CDN failures in the past when NPM has been functioning fine, and frequently have slow downloads from GitHub's CDN.
Packaging every combination of supported platforms into the npm release results in a combinatorial explosion of bandwidth required for a singular download. Ideally, only the artifacts required are fetched.
We had one instance where a Golang native dependency, when compiled for 32bit windows, was incorrectly identified as a Trojan by Windows Defender on 64bit systems.
One of our dependencies tries to pull in 100MB of unnecessary native modules per download as a result of their chosen 'prebuild' solution.
The bundler of choice should be able to statically analyse the dependency tree, including the native module resolution. Webpack for example should be able to take the file and treat it like any other asset, giving it a hash, storing it as per its ruleset.
I often see Electron apps that use bundlers marking the entire native module package as external
, resulting in the distribution of significant unnecessary files, such as the source code, docs, and often due to violating other goals, the native dependencies for other operating systems.
Running build scripts takes a long time, and is prone to failure. The package manager has all the information it needs to do this job, and in packages with native modules, the native bit is part of the package and should also be fetched.
It is valuable to be able to copy the cache directory and be sure that later, the application will be able to build and run.
We pass the yarn cache around on CI to speed up our pipelines, but due to our use of native modules, we run yarn install
to run our native module fetching plugins.
Zero install support is essentially at odds with static linking, perhaps this can be done in the pnp hook so from the perspective of the bundler, the packages are statically linked?
Prebuild uploads artifacts out of band, then downloads them with an install script. Only what's needed is pulled. The artifacts aren't statically linked. This solution requires build scripts.
Prebuildify uploads all artifacts into the same npm release. It's in the registry but the artifacts are very large. The artifacts aren't statically linked. This solution works without build scripts. This solution is supported in zero-install situations.
napi-rs uploads a package per combination of platform and arch supported by their matrix to the registry. When downloading with package managers that support the os
and cpu
compatibility guards, only what's necessary is downloaded. On Yarn Berry, the entire matrix is downloaded due to these guards not being implemented. The artifacts aren't statically linked. This solution works without build scripts. This solution is supported in zero-install situations technically, since Yarn Berry downloads the whole matrix.
esbuild publishes releases to the registry, but fetches them with a build script (that forks and executes an npm install
, falling back to a direct HTTP request). Only what's needed is pulled. The artifacts are statically linked. This solution requires build scripts.
Electron publishes releases as Github artifacts, fetching them with a build script.
Our related prototypes:
yarn-redirect-app-builder is a plugin we wrote to handle one of our native dependencies, app-builder
, an Electron build chain utility. We publish to our registry a package per combination of platform and arch. The plugin redirects the request and statically installs the native dependency. Since it's a yarn plugin, no build scripts are required.
yarn-prebuilds is a plugin that rewrites dependencies on the bindings package, fetching the native module from the prebuild-compatible repository and statically linking it in place of the bindings package. It works well enough but the config is scope driven, and different packages have different prebuild mirrors with different formats that can't all be encapsulated in a singular pattern. Since it's a yarn plugin, no build scripts are required.
We have a monorepo that contains various packages, including Electron applications and templates, and a hardware-in-the-loop testing framework. (You may find the logging familiar, thank you!)
The yarn-prebuilds plugin is used to fetch prebuilds for our native modules. Install scripts aren't allowed in the monorepo (we have other tooling to grab the correct Electron builds). The testing frameworks have manual operations required to pull in the native modules for the correct Node ABI, since the majority of the time we have the ones for our Electron build pulled in.
We'd like a solution that's aware of which 'context' the dependency was required in. The Electron templates should pull in the Electron native builds, the tests should pull in the relevant Node native builds.
The Electron ABIs are non-trivial to determine, it doesn't seem like a good fit to me to include something like node-abi in Yarn itself, it seems a better fit for that functionality to be in a plugin.
It should allow different parts of the dependency tree to request different flavors of a same package. Our dependency trees are currently context-free, and any solution should keep it that way.
This goal seems at odds with the requirement.
Our specific use case is a monorepo containing packages intended for use in different runtimes.
shared-dependency
node-serialport
electron-runtime-package (our templates, development sandbox, integration tests)
shared-dependency
node-serialport
node-runtime-package (our unit tests)
shared-dependency
node-serialport
We'd like electron-runtime-package
to pull in the Electron build of node-serialport
, and the node-runtime-package
to pull the Node build of node-serialport
, running tests in shared-dependency
should also be run with the Node build of node-serialport
.
If only the top level workspace dependencyMeta
field is used, I don't think this is workflow is possible.
Ideally the workspace at a minimum would be passed via context during package resolution, so that its dependencyMeta
field can be read, so each workspace can request different flavours of the same packages.
I don't think the extra verbosity is required here.
Instead of a singular Variants object with a pattern
property, I'd argue for an array of Variants objects with a pattern each. Each Variant object can have its own matrix, includes, excludes and pattern. The first one to match is the one used. This would naturally give the ability to have a fallback (a Variant with a pattern and no matrix). This would allow for greater flexibility matching packages that are already publishing their variant packages without creating a complicated DSL for pattern construction.
Some parameters don't make sense when combined with other parameters, I don't think a simple string replacement with "null"
results in ergonomic package name. (sled-null-null-wasm-simd
for example)
Consider an example of a package that builds for different platform
, arch
and napi
combinations, and a wasm
build where the arch
and napi
parameters don't make sense. Consider the wasm
build has a SIMD version and a non SIMD version. The non-wasm versions do their SIMD support at runtime.
Two patterns cover this use case:
@scope/package-build-%platform-%arch-%napi
for non-wasm builds, then
@scope/package-build-%wasm-%simd
where a specific wasm
key is either set to wasm
, or unset, and a simd
key can be either simd-supported
or no-simd
.
As another example:
"variants": [
{
"pattern": "sled-%platform-%napi",
"matrix": {
"platform": [
"darwin",
"win32",
"linux"
],
"napi": [
"5",
"6"
]
},
"exclude": [
{
"platform": "win32",
"napi": 5
}
]
},
{
"pattern": "sled-%platform",
"matrix": {
"platform": [
"wasm"
]
}
},
{
"pattern": "sled-build-sources"
}
]
This would allow fetching of alternative protocols, different to the one used by the meta package.
A version
parameter would be passed to the pattern
for specifying the version of the package resolved.
For example:
@electricui/app-builder-bin-%platform-%arch@%version
https://cdn.foo.com/prisma-build-%platform-%napi-%version.tgz
Well, how can you be sure that all those package names are not already occupied in the registry? Also, I can imagine a scenario where someone will publish a malicious variant, and somehow the package manager will download it.
The possible variants are known statically at release-time by the package author, I don't think there's a security issue here, even without the scoping restriction. It would be up to the package author to actually publish the packages intended.
Having scoping restrictions while also allowing arbitrary descriptors seems incongruent.
I've started a prototype implementation over at https://github.com/Mike-Dax/berry/pull/1
It parses most of the syntax described in this RFC, populates the cache with the matrix of variants specified, and pretends to do package replacement.
Variant cache may hold up to 6 versions
A variant fetched a cache entry: app-builder-bin@npm:3.5.13 -> @electricui/app-builder-bin-darwin-x64@npm:3.5.13
A variant fetched a cache entry: app-builder-bin@npm:3.5.13 -> @electricui/app-builder-bin-linux-x64@npm:3.5.13
A variant fetched a cache entry: app-builder-bin@npm:3.5.13 -> @electricui/app-builder-bin-win32-x64@npm:3.5.13
A variant replaced a package: app-builder-bin@npm:3.5.13 -> @electricui/app-builder-bin-darwin-x64@npm:3.5.13 with environment: {"platform":"darwin","arch":"x64","abi":"83","napi":"7"}
In the above example builder-util
is a package that depends on app-builder-bin
, and packageExtensions
overwrites app-builder-bin
with Variants to grab our fork that publishes a package per combination. The variants system removes the dependency from builder-util
on app-builder-bin
and injects app-builder-bin-darwin-x64
in its place.
Parameters are created with several plugin hooks:
A reduceVariantStartingParameters
hook lets plugins set keys that are propagated per workspace. These would be used to set things based on the Node runtime, for example the platform, arch, etc.
A reduceVariantParameters
hook lets plugins set keys from a package 'down', inclusive, based on the packages' dependencies. This would be useful for an Electron plugin that sets the runtime to Electron, and the node ABI version appropriately if a package depends on Electron. This kind of thing feels best left as a plugin, since something like node-abi
would have to be pulled in to fulfil that.
A reduceVariantParameterComparators
hook lets plugins define compatibility relationships between values of the parameter keys. For example, Node is backwards compatible with NAPI versions, but not ABI versions for non NAPI packages. This hook lets an napi
key be backwards compatible.
This implementation was started before I read this RFC, so it's quite context heavy and doesn't yet use dependenciesMeta in exactly the same way.
I've got implementation specific questions but they're probably best addressed somewhere other than this RFC.
Thanks for all your hard work as always everyone.
Thanks for the write-up! I've focused around the specific parts you mentioned as divergences; overall I think we're on the same page, and the changes you suggest seem fair to me.
It should allow different parts of the dependency tree to request different flavors of a same package. Our dependency trees are currently context-free, and any solution should keep it that way.
This goal seems at odds with the requirement.
A better wording might be "side-effect free". In short, singleton flavours wouldn't work - different workspaces are allowed to have completely different purposes, and whether one of them requires a transpiled build doesn't mean all would.
With dependencyMeta
, each individual package's meta would still be taken into account (thus being an exception like mentioned in the field's documentation).
Remove the candidates key from the matrix
I added a specific key because I'm not sure whether we'd want to provide a way for users to document the parameters or not. But yes, perhaps it can be skipped for the MVP, it's not critical at all.
Variants field is an array of Variants
That seems a neat solution to the problem of "what if a parameter only makes sense with others". As long as we're clear on which variant matches if multiple would (probably the first), it's fine by me.
The pattern is a Descriptor instead of a package name.
Yep, good idea!
Do I understand correctly, that this will allow to have only a single dependency with variants? What if several dependencies need variants?
Also, what if I want variants in optionalDependencies
or devDependencies
, not dependencies
?
It also requires those variants to be synchronized in terms of version with the original package (ie, if prisma is 1.2.3, then the prebuilt versions will have to be @prisma/prisma-prebuilt-something@1.2.3 as well).
This means that if there is a bug in one of the artifacts, all the artifacts need to be republished. Also, these artifacts are usually heavy, so the registry's disk space usage will increase. Is there a workaround?
Do I understand correctly, that this will allow to have only a single dependency with variants? What if several dependencies need variants? Also, what if I want variants in
optionalDependencies
ordevDependencies
, notdependencies
?
Variants aren't define per dependency, but by package. For example, let's say you write my-toolchain
which depends on esbuild
- my-toolchain
won't have to define any variant. It's esbuild
which would define them in its package.json
. As a result, nothing prevents my-toolchain
from depending on multiple packages with variants, and they can be any type of dependencies. Does that make sense?
This means that if there is a bug in one of the artifacts, all the artifacts need to be republished. Also, these artifacts are usually heavy, so the registry's disk space usage will increase. Is there a workaround?
There's one; let's say you have esbuild-windows
and esbuild-osx
, both at 1.0
. Let's say you need to ship a fix to OSX with 1.1
, but really don't want to send the same artifacts twice for Windows. A valid workaround is to publish an esbuild-windows@1.1
package that depends on esbuild-windows@1.0
and re-exports its exports. It's a little unusual in the current ecosystem, but nothing that semantically prevents it from existing.
Another interesting use-case for Package Variants could be to ship translations as separate packages (@moment/locale-%locale
), and let users define the set of languages they'd want to see installed via dependenciesMeta
.
Compared to the original RFC only one thing would be required to support this use case: to accept parameter arrays. At the moment parameters can only have one single value, which perhaps is a bit limiting.
{
"dependenciesMeta": {
"moment": {
"parameters": {
"locale": ["en", "fr", "de"]
}
}
}
}
Pinging @devongovett since I thought about it after seeing your tweet about this.
hi @arcanis thanks for the ping! we are starting to discuss this topic in our Package Distributions RFC as previously mentioned by @isaacs in this thread.
Hey @ruyadorno; thanks for the ping, but the comment you're referring to was made about a year ago.
I have to say, I'm a little annoyed that you (as in, your team) decided to go your own way rather than attempt to discuss with us. I'm not going to lie: from the outside, it kinda looks like things you implement have to come from you exclusively, or they could as well don't matter. I'm sure you didn't intend it that way, but that's how it feels.
Please understand that, historically, relationships with the npm cli have been rocky at best. Trusting npm burned us more than once (for instance, the Tink release; or more recently how the peer dependencies discussion was handled). Despite that we've always been more than willing to give you new opportunities to show us you cared about us as partners rather than footmen, but your team seldom took them.
If you think I'm being a little uncharitable, please keep in mind the following comment, which izs wrote around the time he posted the comment you're referencing. Then tell me: how can we trust that your RFC process will let us contribute (including with the ability to veto an ecosystem-wide change which we would end up being pressured to implement), when your (former) project lead was very clear it isn't intended for this?
I think that you misunderstand the purpose of this repository, and I encourage you to review the README.md file.
This is the npm RFC process, not the "all JavaScript package managers" RFC process. npm is the final authority as to which features npm implements. If you wish to set up a space in which the maintainers of yarn, pnpm, and npm agree to block one another from implementing features until all of us agree on them, then that is a much larger conversation. But this is not that forum, and never has been. That forum does not exist.
FYI npm has a similar proposal now: https://github.com/npm/rfcs/pull/519
@Haroenv That's what @ruyadorno linked in https://github.com/yarnpkg/berry/issues/2751#issuecomment-1028236910
hi @arcanis thanks for taking the time to write a reply, I appreciate the contextualization.
I'm really sorry to hear you are annoyed that we opened our own RFC, I assure you that opening an RFC is just part of the process and while there are different ideas in our first draft I hope there's still ways to collaborate in a compatible solution.
I don't want to hijack the thread here into a conversation but I'd love to chat and follow up with the points you brought up, I'll make sure to reach out. Thanks again!
Just to add my two cents as someone who writes/maintains various node addons:
A short while ago I reluctantly wrote my own set of tools (here and here) to try to solve the prebuilt binary situation and through that process I discovered a lot of features that existing solutions are missing. I would love to see a cross-package manager solution that works out of the box that features:
process.versions.modules
)buildDependencies
set of package dependencies that only get installed if it is determined that a build/compile is neededIdeally these core features would be implemented in some shared module that also either exposes or expects some sort of stable interface that binary provider/transport modules could use or implement that interface to be able to grab binaries from different sources. For example, there could be a Github provider, an S3 provider, an FTP provider, a plain HTTP provider, etc. With this kind of setup, a package maintainer could for example have a Github CI workflow that produces the output necessary to be consumed by the Github binary provider (I'd imagine in this case the workflow and binary provider could both be housed under the same repo).
:wave: greetings from the npm cli team!
sorry to bring up an old issue but i'm doing research and gathering use cases for this feature on the npm side of things. i would very much like to collaborate with you folks on this. i've been reviewing this rfc as well as the one we wrote, and i think there's a very good possibility we can come to a consensus that supports all of us. i think we can all agree that it's very much to the benefit of everyone, package manager developers and consumers, if we can work together.
i'm planning to document all of the various use cases that i've managed to gather so that we can distill exactly what features we would like to support. if any stake holders for yarn are available to attend one of our open office hours calls, that would be amazing as i'd love to discuss this more in real time.
you can find links to our public events calendar and our open office hours call here: https://github.com/npm/rfcs#how-to-join
i should note that i don't have a specific date in mind to discuss package variants, but i join every open office hours call and am happy to use any of them for this discussion. i hope to see you there!
as an alternative because time zones are hard, i'm also regularly available in the OpenJS Foundation slack. you can find a link on this page https://openjsf.org/collaboration/ (scroll down to the Slack heading and you'll see the invite link) and join us in the #npm
channel
Describe the user story
Various packages want to ship their releases under different forms. For example:
The current way they achieve that is by the use of third-party tools like node-pre-gyp, prebuild-install, or manual downloads. Those solutions aren't ideal since:
Describe the solution you'd like
Any design has to fit some goals:
Additionally, I believe the following goals should also be followed:
The syntax I currently have in mind is based on the matrix feature for the GitHub actions. Specifically, the package author defines a list of parameter combinations the package offers variants for, and a "package name pattern":
The package manager would then detect this
variants
field and find the first entry from theinclude
set that match runtime parameters (runtime parameters would by default be a static list, but users could add their own; more on that later). For example it would resolve toplatform:win32
andnapi:4
. It would then turn that into@prisma/prisma-build-win32-4
(based on the provided pattern), and finally would fetch it and use it as if the user had listed"prisma": "npm:@prisma/prisma-build-win32-4@x.y.z"
from the start.To make declaring combinations simpler and avoid exponential complexity, a
matrix
property would also be supported. The package manager would use it to generate by itself the supported parameter sets:The
include
andexclude
properties would thus only be used to declare special cases (here, to indicate that there's a wasm build, and that the package doesn't work on Win32 w/ NAPI 4:Should the package manager be unable to find a supported parameter set, it would fallback to the current package (in other words, it would ignore the
variants
field). This is critical, because it's the "graceful degradation" feature I mentioned:variants
field to their already-existing packages without having to create a separate package name (that would hurt adoption).One note though: assuming that fallback packages are "fat" packages that try to build the code from source, then in order to keep the install slim for modern package managers the library authors will likely need to follow a pattern such as this one:
This would ensure that
prisma-fallback
would only be fetched if actually needed.Custom parameters
Packages would be able to provide their own custom parameters:
Those custom parameters would be expected to be set by the dependent via the
dependencyMeta
field:Using
*
instead oflodash
asdependencyMeta
key would also be supported as shortcut ("all my dependency should be esm").Cache integration
The user would be able to define in their
.yarnrc.yml
which parameters should be cached, following the same format as the package parameters. Specifying this wouldn't be required (in which case only the packages covered by the local install would be cached, as one would expect):Parameter cascade
Cascades would be explicitly denoted by the
%inherit
value:Describe the drawbacks of your solution
It seems clear the main drawback is verbosity. The amount of lines required may look suboptimal. Keep in mind however that I intentionally kept the code expanded; in practice the code would be shortened (one line per parameter, etc).
It requires to publish multiple variants of the same package to the registry. This is however the very reason for this feature to exist (ie not put every artifact in the same downloaded package), so I don't see it as a drawback per-se.
It also requires those variants to be synchronized in terms of version with the original package (ie, if
prisma
is 1.2.3, then the prebuilt versions will have to be@prisma/prisma-prebuilt-something@1.2.3
as well). This will be a matter of tooling (for example by makingyarn npm publish
accept a--name @prisma/prisma-prebuilt-%platform-%napi
flag which would override the name originally declared in the manifest).Describe alternatives you've considered
Another approach would be to simply leave it to userland. The problem is that userland doesn't have the right tools to interact with the package manager at the level it needs, which makes current solutions awkward at best.
We could also decrease the verbosity by using JS files instead of being manifest metadata. I don't believe this would work because: