nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨
https://nodejs.org
Other
107.52k stars 29.57k forks source link

Official strategy for caching of native packages #30139

Closed nschonni closed 4 years ago

nschonni commented 5 years ago

This might be more of an NPM concern rather than a node-core thing, but I think it might be worth having a discussion here. Some of this might be things that are supposed to be replaced by NAPI.

Currently in node-sass, I just did a home-built pattern of of caching the binding downloads inside the NPM cache. Over our previous release in April 2018, we've had 71M downloads, but without caching it would have been around 3-4MB per install for each download.

It would be good to have a pattern from the node-gyp perspective on how native bindings should use to look-up and cache binding files.

/cc @xzyfer

tniessen commented 5 years ago

This discussion seems more appropriate for this repository than for the TSC. Transferred.

nschonni commented 5 years ago

Hi @tniessen I opened this in TSC since it covers a couple of the repositories under the responsibility of the TSC and from the TSC list of responsibilities

Setting overall technical direction for the Node.js Core project, including high-level goals and low-level specifics regarding features and functionality

tniessen commented 5 years ago

From what I understand, you are looking for a strategy for caching precompiled bindings? Based on my understanding, this is probably something that needs to be considered by npm or tools such as node-pre-gyp, is that right?

I'll ping the rest of the TSC to make sure moving the issue here was appropriate: ping @nodejs/tsc

joyeecheung commented 5 years ago

What does the "cache" refer to here? Pre-built binaries that do not require local compilation on user machines?

Some of this might be things that are supposed to be replaced by NAPI.

Can you elaborate on why N-API is relevant here?

nschonni commented 5 years ago

What does the "cache" refer to here? Pre-built binaries that do not require local compilation on user machines?

yes, there are some libraries using things like node-pre-gyp (which should maybe just be merged with node-gyp) to pull down bindings if available, and falling back to build if they don't exist. Even when these prebuilts can be pulled down, there is no consistent story right now for caching, so installing a package in 2 different projects on the same machine has to download/build each time.

Can you elaborate on why N-API is relevant here?

With N-API offering a single binding for all OS platforms (unless I'm mistaken there), the thought process might have just to bundle up the single binding in the NPM package. With 30+ combos without N-API, that was never a viable option.

tniessen commented 5 years ago

Ping @nodejs/node-gyp, but I suspect that this might not be something we can do a lot about, this sounds like an issue with node-pre-gyp.

N-API offering a single binding for all OS platforms

That's not what N-API does. N-API provides an API/ABI that is stable across Node.js versions, but not stable across platforms. (I wish that was possible.)

nschonni commented 5 years ago

Sorry, I think it's getting lost again. node-pre-gyp has one implementation, but doesn't currently cache, I've built caching manually into a library.

The ask here is a discussion around caching of native bindings in the Node.js ecosystem. Whether they are downloaded or built locally. I get that caching is one of those "hard things in CS".

tniessen commented 5 years ago

For reference for anyone wanting to discuss this, there are already open issues / PRs about this in node-pre-gyp, even though they seem to be stalled:

nschonni commented 5 years ago

Yes, node-pre-gyp has tried to tackle this, but seems to be a dormant project at this time. I would likely still switch node-sass to use it in the next version because it has become defacto standard because there is no guidance from node-core.

I opened this issue one TSC because I think it needs a discussion at the Node.js architecture level because native modules are a feature on the platform, and caching is something that is easily done wrong.

guybedford commented 5 years ago

If the binary fetch operation is designed to be idempotent, does using a package manager like pnpm mitigate the download cost by allowing it to be shared across projects? If so, perhaps advising this to your users could be an option?

jasnell commented 4 years ago

Has this been resolved? Does this need to remain open?

nschonni commented 4 years ago

No, it got transferred from TSC, and then nothing else happened

bnoordhuis commented 4 years ago

I've read through the issue twice and I'm still not sure what you want us to do, @nschonni. It seems like you should take this up with node-pre-gyp, dormant though it may be.

jasnell commented 4 years ago

I have to agree, it's non-obvious whether there's anything for Node.js to actually do here. Given that, I'll close the issue. @nschonni ... if you have a concrete proposal for something we can do, we can reopen the issue or open a PR with specific changes.

nschonni commented 4 years ago

OK, I don't really have the energy for this anymore.

tchakabam commented 4 years ago

@jasnell @bnoordhuis Wouldn't this however be something where npm (or any other package.json based pm) should allow for being more like what node-pre-gyp does? Especially, native bindings (for whichever platform/arch combo) should be package-able assets just as JS code in a publish (of course they need to be labeled as to their platform/arch target).

I would definitely say this is a problem that can be solved by adding semantics to package.json (and therefore it would be a Node ecosystem concern) in order to allow native add-ons to be package assets in a way to distro them multi-platform. One would include one binary per supported target platform/arch combo.

Actually, I am pretty bewildered by the fact that, while npm makes use of caching and avoiding redundancy on file-system for installing packages, a native add-on pre-built would either need node-gyp rebuild on every install, or imo even worse downloaded again on every package install. That is because node-pre-gyp doesn't support caching (which is also absolutely sub-optimal imo with regards to unnecessary traffic caused by this).

But it seems to me that npm (or any other package manager) should actually do this whole job in the first place. Why could it not handle native bindings as well, and with that actually helping to sort out multi-platform distro by standardizing the metadata pointing to prebuilt artifacts within a package - instead of seperate effective custom registry, that node-pre-gyp obliges users by design to maintain?

EDIT: I am thinking that a package can of course contain any other assets that one would include on publish, and this could be left to the application to sort out, on which target OS/arch it is running, and wether any of those assets can be loaded as native add-on in this case. I guess what is however confusing (from the side of NodeJS support/maintenance) is the lack of any advice of "best practice" here, nor any other utility layer proposed to do this on the node-require side.

And finally, node-pre-gyp being a sub-optimal tool for many reasons, due to its opinionated tendency towards AWS S3 as storage, its lack of any caching features, and by design its duplication of network requests and effectively a custom registry maintenance on top of the npm publish of any package.

@nschonni

With N-API offering a single binding for all OS platforms (unless I'm mistaken there), the thought process might have just to bundle up the single binding in the NPM package.

While this is wrong (and possibly was confusing with regards to your initial post to many here) - you have to provide a binding build for each platform/arch combo that you want to target (example platforms: windows, linux, macos, example archs: x64, arm, ...) - Your idea goes in the direction of what I would imagine package.json should provide as standard feature. We would have to allow for semantics that label each platform/arch available as a package asset however, so one asset for all is not gonna work, that's what the "native" thing makes us do xD We can be happy in fact there is N-API to cover the runtime ABI side of it :) Otherwise we might as well have to deliver one for each Node major or worse.

@guybedford I don't understand how pnpm would help if package.json semantics don't allow to describe native add-ons as assets in order to distro them multi-platform in the first place?

tchakabam commented 4 years ago

Sorry if my post here above was lengthy, but I found that some things need to be explained to make clear what the real concern here is for users of native add-ons, and what the current options are in fact (use node-pre-gyp or bundle all possible native combos into one package and handle platform/arch discovery on target).

Ideally, a package-manager would be able to identify the platform and download the correct add-on selectively from the registry, it being identified as an asset as such on publish (via package semantics).

One thing to add in defense of using node-pre-gyp is though, it covers the case of "fallback to build" which may be useful for various reasons, invoking node-gyp in this case. The default approach (historically) being people putting node-gyp rebuild in the "install" part of package.json. node-pre-gyp is hook into this, replacing the node-gyp build by a download. It offers the selectiveness of the download, which would not be the case if I just put all native target binaries into the package as assets (they would get downloaded as a whole).

What this ticket should be about is not having to run node-gyp (or downloading from anywhere else) in the first place.

Obviously, we can bundle any asset that we want inside a published package, and have that be require'd as assets with some JS platform detection logic to expose to as package entry point. And that would be probably the solution I'd recommend atm to anyone instead of maintaining another storage just for the binaries to be downloaded with node-pre-gyp.

What should be left to figure out for NodeJS is how to allow package-managers to do the selective download of a given target native-add-on from the repository, as it was marked & published as such with the package itself.

tchakabam commented 4 years ago

For purpose of providing answers to anyone else interesting in solutions here, I found that https://github.com/prebuild/prebuildify is doing exactly what I was describing above in terms of bundling :) In combo with https://github.com/prebuild/node-gyp-build it allows an bundled add-on for the given platform to be selected and loaded at runtime (see the readmes).

Generally the https://github.com/prebuild/ organization has a lot of interesting tools here.

otoh I understand that node-pre-gyp may be a useful tool for whoever actually wants his native builds to live outside of NPM :)

I hope that anyhow, explaining all these different cases and existing workflows may have been helpful.

I think that somewhat standardizing or integrating into the node-gyp toolchain what prebuildify does would make sense.

Toxicable commented 4 years ago

Bit of a fly by comment; wouldn't it be a bunch easier if the recommened way to do native addons was via WASM instead? That way it's cross plat out of the box, with similar perf (last time I checked). Then NPM wouldn't have to change anything, you can add it with how current NPM works and it'll be cached all the same

tchakabam commented 4 years ago

@Toxicable Yes that is a good comment you make there I think.

This exact thought of yours also came up to me yesterday during this nightly essay here on "native node things". But I didn't want to add it on top of all this.

I think though, you have to see that as seperate line of progress, to be able to port all C/C++ (or also Rust, Golang,..., any native instructions compiled languages) onto the WebAssembly platform. Which probably wont ever be possible, even if we can still make a lot of effort there to enable WebAssembly as a platform.

Without starting to write another "long article" on that here, porting and running code on WASM as a platform may pose other constraints and boundaries to the initial source code, and how it can run same as in native execution.

From a high-level view, compiling C code to WASM is pretty much compiling to byte-code and executing on a (JS)VM. Performance isn't the huge problem here, but using specific low-level system functions from per-platform-implemented methods, behind one cross-platform API, like many C libs do, may pose problems on WASM - which may or may not find solutions in the future bit by bit.

WASM is a great option to solve the whole multi-platform thing in the first place, and will certainly get many portability questions solved that way. But we will always need to keep the option open to use native bindings :)

In fact, all of the effort around N-API and node-gyp exists seperately of WASM, and will always exist side by side hopefully.

tchakabam commented 4 years ago

To put into more philosophical words: "WASM is a means of abstraction with its cost lying in runtime constraints, whereas enabling multi-platform build distros of zero-cost-at-runtime abstraction cross-platform C libs is an abstraction where the cost lies in the distribution channels complexity."