nodejs / build

Better build and test infra for Node.
507 stars 166 forks source link

Native add-on compile build & cache service #151

Closed rvagg closed 5 years ago

rvagg commented 9 years ago

It turns out that @orangemocha and I have been thinking about a very similar solution to the native add-on compile problem.

  1. Accept queries for binaries of packages with the following parameters:
    • Package name (e.g. bignum, leveldown, serialport)
    • Package version (e.g. 1.0.2)
    • Platform (e.g. linux, win32, darwin)
    • Architecture (e.g. x86, x64, armv7)
    • Node ABI version (i.e. process.versions.modules, a.k.a NODE_MODULES_VERSION)
  2. If no matching binary exists and the combination is acceptable build it
  3. Serve the binary or serve a rejection for <reason>

Some additional thoughts about getting something like this working:

/cc @nodejs/build, @springmeyer, @othiym23, @nebrius (pls unsubscribe if you don't want to be involved in this discussion, I'm just guessing)

nebrius commented 9 years ago

I think this sounds really great, I hope we can get some forward momentum on it.

Are you envisioning the query described in step 1 as a part of the NPM install process, i.e. 1) user types npm install 2) the NPM client queries this service running on Azure 3) the service returns the binary if it has the binary cached 4) if not, the service builds it on the fly, if it has the correct build configuration and caches that build 5) otherwise it returns an error to the NPM client

Did I summarize that correctly?

othiym23 commented 9 years ago

This sounds fantastic and aligns with a bunch of stuff on npm's road map. How can npm help?

jbergstroem commented 9 years ago

This also makes the assumption that node/io.js has been built a specific way (i.e. no shared libraries).

Edit: as well as linked dependencies being met (mysql, postgres, nan, etc)

orangemocha commented 9 years ago

Really looking forward to making this happen :+1: The native module compile issue has been a thorn on my side for quite some time.

I will start by looking at the Azure side of things. That is, unless someone knows of a cloud provider that can virtualize OSX/Darwin.

orangemocha commented 9 years ago

@othiym23 when a module is published to npm, npm makes no guarantees that it doesn't contain any malicious code, right? I suppose the same guarantees (or lack thereof) will have to apply to the prebuilt version.

rmg commented 9 years ago

+1 on using foundation money to pay for an OS X pseudo IaaS since it also solves the trust problem with access to private networks.

rmg commented 9 years ago

Three groups come to mind that would benefit from a service like this, but one of them might not even be able to use it.

  1. Slow environments like ARM where any reduction in work during module install would be appreciated.
  2. Users who don't have a build environment because it is too difficult to install/configure.
  3. Users who don't have a build environment because they are prohibited from installing one.

The 3rd group, unfortunately, also seems the least likely to be allowed to use such a service. Is there anything that can be done to make these binaries trustworthy to those users? Some combination of signed npm packages and signed binaries maybe? @othiym23 I didn't see anything like that (signing) on the npm roadmap (I was only skimming), should it be?

indexzero commented 9 years ago

Many moons ago @bmeck and I wrote a few modules around this with this very use-case in mind. In particular it deals with cross-platform, multiple architecture, and multiple node versions from a single source. Might be useful for these current efforts.

cjbj commented 9 years ago

As a native add-on owner, these are issues/requirements I see:

rvagg commented 9 years ago

Thanks @cjbj, great input on all counts, particularly re integration with alt build services. Can you elaborate on your point about licenses though and how it related specifically to this effort?

cjbj commented 9 years ago

@rvagg Pretty much any software you will serve has a license. Lawyers like click-throughs for lawyerly reasons and will quote precedences ad infinitum.

Allowing add-ons to choose to have a click-through before their binary is downloaded will let owners satisfy any legal restrictions, either for the add-on code itself, or for its third-party dependencies.

If click-through support isn't feasible to implement, then the second best option is to make sure the add-on license(s) are visible or accessible prior to serving any software. Either the full licence text or SPDX identifier should be accessible.

Of all the points in my previous post, having click-through support is the most important to me.

nebrius commented 9 years ago

@cjbj just to clarify, is click-through a feature you would like to see in the NPM client?

cjbj commented 9 years ago

@nebrius absolutely yes.

othiym23 commented 9 years ago

@orangemocha / @rmg I'm overdue to add a section on trust and provenance to the npm CLI road map, but this is something npm, Inc has been thinking about for a long time. It's a big, cross-team project that's hard to get right and easy to mess up, which I think is part of why we keep letting it get pushed down the priority queue.

I think the most relevant thing here is that npm is planning on adding some notion of signed packages and identity management (at some undefined point in the future), but that this kind of delegated build system can and should manage trust independently of that, i.e. npm-the-service should probably not be delegating authority to an external service.

That's not set in stone, though, and we're still open to input (that doesn't require npm users to use GnuPG or generate X.509 certificates 😎). I'd be interested to hear if @evilpacket has any thoughts on this.

rmg commented 9 years ago

@othiym23 agreed, and I hear you on how things like that seem to get pushed down the priority queue.

I didn't mean to suggest that there be any delegation, just that we need the building blocks (signed packages) in place to have any hope of at least considering a chain of custody proof. Glad to hear it's at least in the queue somewhere (would have honestly been surprised if it wasn't).

springmeyer commented 9 years ago

Thanks for the /cc @rvagg - I'm interested in supporting this effort and being a resource. I'm primarily coming from the perspective:

I'm mostly offline currently and over the next couple weeks due to some family issues. But I'm available by email (dane@mapbox.com) or phone if anyone here wants to pick my brain or hear about the pain points with node-pre-gyp and building binaries on appveyor and travis. Overall I'm committed to maintaining node-pre-gyp as https://www.mapbox.com/ depends on it heavily. But I would also be keen for an official solution to completely replace node-pre-gyp and allow it to go away eventually.

tohagan commented 9 years ago

Excellent news! I started the thread above to request a solution to this issue for NodeJS and am delighted to see a well thought out approach that not only avoids local builds but also speeds up and CI's the whole deployment process so we get consistent uniform builds. Please consider some of my suggestions in this thread for lobbying for code signing and signature verification. At a minimum we should at least use SSL for all binary downloads. We might also be able to use certificate pinning if a limited set of download servers is employed (CDN?). Without this we simply can't establish the trust required for many deployment scenarios. Thanks again!

tohagan commented 9 years ago

Just had a thought ... Since its really a CI service ... it would make sense, particularly if Azure did not come to the party ... to approach Travis CI. I think it would fit their existing business model very well and align with their generous open source commitment. Integration would probably also be least effort for NodeJS developers since most would be using Travis already.

rvagg commented 9 years ago

Travis just started offering a beta of OSX support, so suddenly they are a more interesting party in this discussion.

tohagan commented 9 years ago

If they are historically a Linux and now OSX shop they may feel some reluctance to consider CI on Windows but really it's just a matter of hiring someone with the right ops/security expertise and seeing that there is a clear business case. In approaching them, I'd point out that binary builds are a pain point for almost all dynamic languages and that their business opportunity is much wider than just NodeJS. I think it would be a win-win for them and us. We'd get the all the public NodeJS binary open source code built and downloadable and it would be paid for by their customers who need their private projects and products CI'd on the same system. Travis has already proven that it works.

bjouhier commented 9 years ago

Cool idea! Very useful for people like us who have to support several target platforms.

orangemocha commented 8 years ago

I have been investigating this further, with lots of help from @joaocgreis, to try and clarify some of the design aspects and come up with a plan for delivering a working solution. We iterated over a few approaches, some of which turned out to be dead ends but now we think the story is good enough, so we would like to share our findings and get as much feedback as possible before we start implementing a prototype. @nodejs/npm, a review from someone at npm would be especially helpful.

/cc @nodejs/build @nodejs/addon-api @nodejs/node-gyp

Goals

The main goal of this effort is, as already stated in this issue, to implement a service that would compile native modules on behalf of end users, to remove the need of compilation on the user machine during npm-install. This should sound pretty uncontroversial, but I explicitly want to call out that anything outside of npm-install is a non-goal. In particular, all issues related to configuring node-gyp by the module author at development time are beyond the scope of this proposal.

Another important goal, or opportunity, is to deliver a solution that would ease the pain of users very soon - at least in the vast majority of use cases. There are multiple ways to skin this cat. The "right" long-term solution from a design perspective would probably entail an ABI-stable abstraction layer in the native modules API, but that might take years to get implemented and adopted. If there is a way to relieve the pain quickly, that's the direction that I think we should focus on with this build service effort. This assumption has dictated the design approach that we have undertaken for the first incarnation of the service, as detailed below.

More goals:

It would be nice to include support for private modules, but at this point I am not sure if that's feasible. That's something that we still need to explore. I am assuming that we can live without it, at least in the beginning.

The node-pre-gyp approach (tabled for now)

One of the first solutions we considered was to leverage all the great work that has already been done in node-pre-gyp. node-pre-gyp is designed to address the very issue that we are trying to solve, so it was a natural choice. It already defines a workflow for downloading modules as binaries, with possible fallback to compilation. The part where it falls short, is that it requires module developers to set up their own deployment sites (though it integrates easily with AWS) and also manually rebuild modules for new versions of Node, or set up their own automation to do it. The approach we considered was to extend node-pre-gyp to use the module build service, so that module authors wouldn't have to manage their own compilation and distribution.

While this seems like a reasonable solution, it does requires a modification of existing modules and an opt-in by module authors, at a minimum to make their module use a modified version node-pre-gyp. Since this is outside of our control, and it could take a long time, we set this approach aside and instead started investigating possible ways to make the module build service provide compilation for existing modules, without requiring any module modification or opt-in.

Supporting existing modules without modification

In order to support existing native modules without any modifications, we need to make some changes in the npm stack, either on the client or the server, or both.

The burden will then be on us (the build service maintainers) to test all the modules and whitelist the ones that are supported. We can have additional server-side configuration to define different modalities so that we can support as many modules as possible, and even special case a few. There are ~1750 native modules, with the top ~150 most downloaded accounting for 99% of all native downloads, so going down the list to validate whether they work with the service and configure the service appropriately for each module seems like a feasible task.

The module consumer will be able to choose whether they want to use the service, through a configuration setting or a flag. For illustration purposes: npm install --use-module-build-service. For pure JS packages, this will behave like regular npm install. For native modules, instead of downloading the package and compiling the source to a native library, it will delegate compilation to the server and download a snapshot of the results, as if the compilation occurred on the client.

Current npm-install workflow

Disclaimer: we poked around npm but didn't go into too much detail, so it's possible that some elements in the description below might slightly differ from the actual implementation. The overall picture should be pretty close though. [@nodejs/npm please let me know if I missed anything important here]

When npm installs a module, it first fetches the registry entry for that module, which is a json file that pretty much includes the package.json for all the versions of the module (see here for an example). It will first look for it in a local cache, and if it's not already there download it from registry.npmjs.com.

The client then does some magic to decide which version to install, downloads the corresponding tarball (which also may be cached), extracts it and executes any preinstall/install/postinstall scripts as specified in the scripts section of package.json.

Native compilation typically happens by means of npm invoking node-gyp rebuild in one of the install scripts, but this can be specified in multiple ways. The module author can invoke node-gyp rebuild from one of the scripts, or even from inside arbitrary shell scripts invoked by npm scripts. If the install and preinstall scripts are empty and there is a .gyp file in the root of the package, npm will automatically set the install script to node-gyp rebuild.

The package definitions (both in the registry entry and package.json in the tarball) contain a gypfile property, which is set by npm (before publishing?) if there is a .gyp file in the root. This seems like a reliable indication that the module is potentially native. I am saying 'potentially native', because a module can have the gypfile property set, without actually invoking any install scripts or having a gyp file. We found at least one instance of a false positive.

Modified npm-install workflow (for the first prototype)

For the initial prototype, we are striving to minimize the changes in the npm client, so that it will be easier to maintain. Once the prototype proves its value and we have had the opportunity to learn from some real-world usage, we can look at how to extend the npm client syntax to support the changes in a more correct way, and/or perhaps go down the node-pre-gyp approach.

The following description is only meant to give an overview of the modified workflow. Details may vary in the actual implementation.

The user runs:

npm install -g npm-precomp
npm-precomp install native-module-xyz

npm-precomp is a modified version of the npm client, which defaults to using the module build service. npm-precomp behaves exactly like npm, except for when you use the install command and the module is native.

After having retrieved the registry entry for the module, the client knows that the module is potentially native (gypfile is true). Instead of fetching the official tarball, it sends a request to the build service, providing the module name, and all the parameters that uniquely identify the client configuration from the build service perspective (listed in https://github.com/nodejs/build/issues/151#issue-99730733, point 1).

To handle this request, the build service examines its configuration for the given module and platform. Some platforms or some individual modules might not be supported (for various reasons). In that case, the service returns a not supported response and the client falls back to the standard workflow, which might include compilation.

If the module is supported and it's already been compiled, then it's served from the service cache. Otherwise an environment is spun up in a slave VM/container that is suitable to perform the compilation on behalf of the client. The slave pretty much executes the same npm install, then stores a snapshot of the resulting folder into a tarball and returns it to the client, while also adding it to the cache for later re-use.

The build service also needs to modify the scripts section in the package.json for the precompiled package, so that any invocations of node-gyp rebuild are removed. Since node-gyp rebuild can appear in a few places, and since there can be additional commands in those scripts, how to rewrite the scripts will need to be configurable for each module. We'll need to be able to specify which portions of the scripts need to be run on the server or on the client.

Back on the client, it is also important that the precompiled tarball doesn't get stored in the regular npm cache, so that npm-precomp install doesn't interfere with the normal npm install.

[Implementation note: the npm client currently executes the scripts as specified in the registry entry, not the package.json in the tarball. The above workflow seems simpler, so we'll try to modify npm to read scripts from package.json. Otherwise, we'll need to implement a slightly different request/response pattern.]

Maintaining the modified npm client (npm-precomp)

One obvious way of accomplishing this would be to maintain a temporary fork of npm. However, this would require us and our users to keep updating it as new versions of npm are released. So we are pursuing a different approach instead. npm-precomp can be just a wrapper around whatever npm you have installed on your machine, and inject its custom behavior where needed, as seen here (warning: severe hack ahead!): https://github.com/janeasystems/npm-inject-test/blob/master/index.js.

Scaling the service up and down

We can expect the service workload to be bursty. Whenever a new ABI-changing version of Node is released all modules will need to be recompiled for that version, so there will be a peak in the service load. As precompiled packages get cached, the compute usage will slowly return to normal levels.

To handle this usage pattern, an elastic use of resources is a must. New environments will be spun in parallel when compilations are requested, up to certain limits. Since we want to control the total expenditure while still allowing for bursty usage, we will define the limits in terms of resources used within a given time period, e.g. maximum resources used within an hour, maximum resources used within a month. We might also want to specify limits per module and per platform. Once a limit is reached, the corresponding requests get put in a queue. If the service cannot sustain the load some requests will eventually time out.

Storage is much cheaper than computing resource, so we'll be caching precompiled modules extensively. Compilation environments will be spun up dynamically and shut down shortly thereafter. At times when no compilations are in progress, the service should use no compute resources other than for the entry point itself.

CDN for scalability, DDoS protection etc

In order to make the service scalable, we'll use a CDN for serving request as much as possible. The precompiled tarballs will be stored in a CDN.

Although, we haven't looked at any specific CDNs yet, the hope is that even the initial request for the tarball can be sent to the CDN first, and that we can leverage the CDN for caching by specifying the caching policy in the response from the service to the CDN.

A CDN might also provide mitigation to DDoS attacks.

Security considerations

Trust between npm and the module build service

Since the build service will act as a proxy for multiple npm clients towards npm, and hence the source of more traffic than the average client, we will want to make sure that our friends at npm have a scheme in place to not mistake the build service for an attacker.

The build service will inherently trust npm and rely on whatever mechanism it will provide to verity authenticity of modules (https://github.com/nodejs/build/issues/151#issuecomment-130740382).

Package integrity checks

We can compute fingerprints of precompiled packages, and perform additional integrity checks at various points (including on the client) to reduce the risk of precompiled packages being manipulated in storage or during transfer.

Slave integrity

Build slaves will be executing arbirtray code (as specified by the module author) when running npm install. Slaves will have to hosted in a sandboxed environment, and they will need to be reset to a known state before each build.

Support for private registries

The first version isn't likely to include support for private registries, but this is something that we should consider adding in the future. Like you can operate your private npm registry within corporate walls, you would be able to run your own build service. At a minimum this will require being able to configure where/how slaves are spawned, and where the tarballs are stored.

Other approaches considered and ruled out

A failed attempt to avoid changes in the client

We had hoped that we could leverage to the --registry npm flag to avoid having to make changes in the client.

The idea was to clone the npm registry and keep it up to date via CouchDB synchronization. Clients would hit the service by specifying the its URL with the registry flag/setting. On the server we would run a modified version of npm-registry-couchapp that could handle the precompilation for supported modules, or defer to the official npm service for everything else.

Not having to change the npm client sounded very enticing. But alas, the npm client doesn't send the parameters needed to identify the client configuration, and without those we don't know what to compile for. While we tought it would be acceptable to include the platform and arch parameters in the registry URL, it would be a poor solution for the Node ABI version number.

Paying for operational costs

In the beginning we'll make this service available for a subset of modules and platforms, to make sure we can fit the costs into a preset budget. One of the goals of this initial phase will be to measure the operational costs of the service and make a projection of what it will cost to open it to a larger set of modules/platforms. In terms of modules, we will enable building the top X most downloaded modules (with X to be determined). In terms of platforms we'll certainly want Windows in there, because it's the platform that causes the most grief to users of native modules. It would be nice to also have support for OSX, if we can overcome the technical challenge of virtualizing it and get access to enough hardware resources for it.

This initial phase will be sponsored by Microsoft. Once we have assessed the demand for the service, and have gathered more data about its cost, we can figure out the best way to sustain it in the long term.

Next steps

othiym23 commented 8 years ago

@orangemocha I'll try to have a response to you before the end of the week, but there's a lot in here (& I also have a lot going on), so it may take me a few days. Thanks for putting this together!

orangemocha commented 8 years ago

Sounds good @othiym23 , thank you!

mhdawson commented 8 years ago

In what you have outlined I assume there would be a way to specify to npm-precomp different back ends so for example a company could provide their own inside their firewall ?

orangemocha commented 8 years ago

Yes. Even though I don't have any data on the usage of private registries, I assume that it would be valuable to support that scenario, or at least provide a path to it.

It's a non-trivial aspect though and additional input from people who uses private registries would be helpful.

Assuming that for the public service we'll host the build slaves on a cloud provider (e.g. Azure or AWS), setting up a similar mechanism for slaves to run within corporate walls would require a considerable effort and I am doubtful that many organizations would find such effort justified by the benefit of overcoming the npm-install hurdles. Perhaps compilation in the public cloud with the ability to configure a private account for it would be a reasonable compromise? The service entry point could run within corporate walls, fetch the sources internally, and push them privately to the cloud slaves for compilation.

JCMais commented 8 years ago

What about packages that have dependencies on some libraries being present?

For example, node-oracledb requires the Oracle client libraries, how would that be possible with that alternative?

Edit: nvm, found it.

We can't support all native add-ons, consider node-canvas as one case where there are system dependencies that are necessary to both compile and use it. Unless a native add-on can ship its dependencies via npm it's going to be difficult for us to support it.

But anyway, any ideas for a workaround here?

tohagan commented 8 years ago

Thanks once again for the thought, planning and collaborative effort you've put into this. Deeply appreciated! A couple ideas in case they are of use ...

I suspect that the private builds will be your best bet for paying for the service long term. It would be great to get a measure of market interest in this to help you assess long term viability based on these principles. I dare say you already know this.

In case you find the need to sync part of the npm CouchDb database, you may be interested to know that CouchDb 2.x (which I suspect the npm team will be keen to use for it's clustering features) is planning to support a changes feed for indexed views that will support an efficient filtered replication (I'm thinking filtered by gypfile === true). Further down the track AvanceDB may also be of interest.

indexzero commented 8 years ago

@orangemocha I'll start off by parroting everyone else and say great job on an in-depth spec. As I mentioned previously in this thread, @bmeck and I have tackled similar problems in module-foundry and module-smith. Wanted to share learnings from that:

The build service will inherently trust npm and rely on whatever mechanism it will provide to verity authenticity of modules.

One detail you might want to be aware of is the unsafe-perm flag, which is responsible for UID/GID flipping when running npm commands. We defaulted to nobody and nogroup (see: relevant code in moduleSmith.prepareRepository, which is probably the best approach for this service as well to strip unwanted privileges from the npm process.

There are also two deceptively complex features that this service that I wanted to highlight:

1. Supporting multiple Node ABI versions?

Otherwise an environment is spun up in a slave VM/container that is suitable to perform the compilation on behalf of the client. The [minion] pretty much executes the same npm install, then stores a snapshot of the resulting folder into a tarball and returns it to the client, while also adding it to the cache for later re-use.

Spinning up an individual container per build seems pretty sane. Do you intend on spinning up a container per build and per Node ABI version? In module-smith we were able to build a module for multiple versions of node and arch from the same node process by using specific npm environment variables:

npm_config_nodedir
npm_config_user
npm_config_cache
npm_config_node-version
npm_config_registry
npm_config_arch

There may be additional environment variables that are now important, but if you're going to build multiple versions side-by-side in the same container each one will need it's own npm_config_nodedir and npm_config_cache to prevent side-effects between node versions. There is relevant code in moduleSmith. getBuildDescription which outlines these env vars.

2. Windows support?

If this is taken on it would be a huge win for node users on Windows. Configuring all the necessary prerequisites on Windows is significantly more difficult than other platforms and one of the biggest (if not the biggest) pain point node users were reporting on Microsoft in 2013 (may have changed since then).

At the time they contracted us to add Windows support to module-smith and module-foundry. Hopefully some of that work is still useful / relevant here. The complexities here are pretty nuanced:

There is yet more relevant code in moduleSmith.spawnNpm

Hope that you find this information useful and looking forward to seeing what you come up with!

distracteddev commented 8 years ago

@indexzero Do you happen to know if module-foundry/module-smith can still be used today? Or is there another, userland module/system that accomplishes a similar task? Even something simple that only works on a pre-configured environment would work for my particular usecase (just trying to speed up some docker builds where the main bottleneck is building native modules)

dead-claudia commented 7 years ago

Will there be a way to optionally disable this (or use a pre-existing binary)? That would be very useful for Electron/NW.js users that also need native modules.

Mithgol commented 7 years ago

@isiahmeadows …if the build system's source code becomes open as originally planned, then Electron's / NW.js's teams might setup their own build farms and reuse the same npm features (however, npm features would have to be designed beforehand with that possibility in mind).

NW.js in particular has been reported to be able to reuse Node.js binary modules (on Mac and on Linux) without changes when the ABI of Node.js and NW.js is matching. (According to NW.js docs, an LTS release is required for such a match.) Therefore we might expect NW.js to immediately benefit from the Node's builds (though only on non-Windows systems and only for specific NW.js versions).

refack commented 7 years ago

We can hook such a service into GYP (or gyp.js) with key = hash of all the input files + platform + toolset. This gives us an unmodified install flow, and another use case:

Trott commented 5 years ago

Closing due to long period of inactivity. Feel free to re-open if this is a thing. I'm just trying to close stuff that has been ignored for sufficiently long that it seems likely it's not something we're going to get to.