nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨
https://nodejs.org
Other
107.61k stars 29.59k forks source link

Feature Request: Distributable Artifacts #11903

Closed bmeck closed 5 years ago

bmeck commented 7 years ago

This is a proposal for a feature that can create and use single files as a means to distribute packages be added to Node's core. In this document, use cases are presented but no implementation details are given. This is a rather large undertaking and should be delivered in smaller parts than the whole picture given here. EPs which include exact implementation details should be created as this work is agreed upon and split into smaller chunks.

This issue has been created as a successor to the node --install.

Use cases

Application users

People wishing to distribute their node.js applications face a tough time currently. There is no standard way in which to distribute these. Docker is increasingly common, but so is telling people to go install dev tools such as npm on their machine and run something from the CLI. Compare this to Electron or Java .jar files which require double clicking a single file.

This case notably has the following features of interest:

feature requirements
global installation well known directory configuration
global installation standards for creating scripts invoking node properly on diff OS
double click CLI file extension support
double click installer based OS file extension registration
binaries build support like _third_party_main (native addon statics like process._linkedBinding or through extraction)
binaries random access in file
binaries trailing headers
binaries API to access assets (a Resource API)

There are concerns here about how node-gyp functions not being compatible with statically linking in a dynamic manner for binaries. In existing research like noda the solution was to extract the .node file to disk as a temporary file and then load it. For truly dynamic distributables that cannot be statically linked, this approach is commonplace.

Runtime plugins

The applications created by node.js have no standard for which to declare how application plugins should be created. Commonly this involves installing devDependencies, but for true runtime determined plugins this functionality does not make sense. Examples of runtime plugins are things such as servelets or minecraft like addons that are not depended upon by the applicaiton intended to run the plugin.

In well designed cases you can simply drop a .dll or .jar into a folder or onto a GUI and have an application know what to do. This should be easier and less home grown for the common case when developing on Node.

This case notably has the following features of interest:

feature requirements
plugin loading require/import support
uniform format choosing a file format with potential for future meta-data

Verification of distributables

Due to security or connectivity, verification of the integrity of node.js modules is difficult. There is no clear standard for signing of the code, nor is there a distribution format to even create a digest from.

Offline distribution is also important in places with low fidelity connections or no connections. Rural locations, network limited secure environments, etc. all would benefit from having a standard of both distribution and verification.

This case notably has the following features of interest:

feature requirements
verification of signatures CA/Keystore standard
verification of signatures CLI support
creation of signatures CLI support
low connectivity offline capable CA/Keystore

Isolation

It should not be possible for other modules to modify the manner in which snapshot internals are loaded. This allows a rudimentary guarantee of evaluation order and dependency resolution but does not intend to prevent mutation of global state.

It may be prudent to restrict the ability to load certain modules while within a distributable (fs, child_process), doing so is potentially configured by the CA/Keystore standard.

Mutability

In order for verification to occur, the internal state of a distributable must not have changed. If the only time required for verification is at install, extraction to disk is viable as any mutation after install is allowed.

If verification needs to be done when the runtime loads a distributable, an immutable state needs to be setup for a distributable.

Creation of reproducable dependency graphs

One of the largest problems with tooling in node modules is the lack of reproducable builds. Even with so called lock files, lack of an source of authority mechanism and signatures can lead to problems due to using naive things like checksums and name/version pairs. Creating a fully standardized bundle format that includes the entire assets of a module or application would prevent home grown solutions.

With a standard format in place it would also be possible to do a full audit of a dependency graph without having custom tools or having to recreate the entire dir structure on disk.

This case notably has the following features of interest:

feature requirements
cross environment format should not be plaintext
cross environment CLI should be able to produce and extract distributables w/o OS tools
symlinks format should support
full tree format should be an full snapshot of the fs, not a manifest
full tree API to access assets if distributable is not extracted (a Resource API)

Future Potential

The following I believe are best left to the future, but should be important for any design decisions.

Encryption

It should be possible to use encryption to generate a distributable that requires a key to be decrypted.

Privacy

In combination with the new bytecode system in V8 for example allows declaring the source string such that the JS debugger would not be able to see the source that generated the bytecode a modest level of code privacy can be obtained.

De-duplication

Sometimes, there are many duplicate files within a single application due to things like LICENSE files. While extracted to disk a solution is often to use hard links, it should be possible for a distributable to avoid declaring the same file body repeatedly.

feature requirements
de-duplication format should be extensible to have a hardlink like mechanism

Poly-distributables

When creating distributions, it is sometimes useful to include multiple variations of an application for testing, localization, etc. This may be due to lack of tooling on a target machine, or it may be to ease a specific workflow.

Distributables could allow some mechanism by which they can include multiple variants withing a single distributable for their internals based upon things such as: language, architecture, debugging, etc. This most likely is not intrinsic to the file format, but more likely revolves around inteligent entry points.

feature requirements
poly-dsitributable format should not prevent userland from making runtime based routing decisions

Shared library

This is the most complicated problem to solve. Many times, when creating applications, duplicate dependencies are extracted to disk in multiple places. Some attempts to avoid this such as using symlinks to a global cache exist. In order to alleviate both an audit, updating, and file size complexity this should be looked at. In particular, having a standard signature format is not enough.

The layout of a distributable is not something in scope of this proposed feature, that is left to package managers. Existing package managers can have complex relations where nested packages are not easily expressed using just symlinks due to how realpathing works in Node's module system.

I suspect, the creation of a well defined shared cache based upon signature of a distributable could be shared amongst all the package managers however.

UX Bikeshedding

All names/UX happily looking for improvement. I will be using .distributable for an opaque file extension.

node app.distributable

Runs the entry point to app.distributable

node --create-distributable path/to/app/ --crt ...
node --create-distributable path/to/module/ --crt ...

Creates a distributable, optionally using --crt to declare signing information.

node --extract-distributable app.distributable

Extract app.distributable to the current directory. This is probably not terribly useful except to try and muck with source.

node --install-distributable app.distributable

Similar to npm --global install, place app.distributable in a configured place and setup permisions and command shims.

node --verify-distributable app.distributable

Automatically run on install or extract, this checks app.distributable against Node's verification policy (signature may not be the only criteria).

cd git/nodejs/node
./configure --main=app.distributable
make

Create a single binary which runs app.distributable when started. Does not include debugger or REPL by default.

Qard commented 7 years ago
  1. A core tar module might help here.
  2. What about dependency deduping? It'd be a shame to throw that away.
  3. For native code, I wonder if just using embedded resources in .node files would be worth considering. It's possible everything could just be a .node file which includes some extra stuff in it, and things could even be precompiled as bytecode as part of that. I know .node files are platform-specific, but that might actually be a good thing--one could create platform-specific builds with different dependency sets and less conditional branching.
bmeck commented 7 years ago

@Qard

  1. tar is not suitable for binaries since it cannot be concatenated to end of a binary and then used for so called "self extracting" executables. It also does not have random access which means lookups could be very slow on large files.
  2. I think this can be added after the fact if we choose a good file format that would allow hard linking like behavior.
  3. That somewhat works, but is platform specific which I think most people just want to run the command once and run the .distributable on all machines if their source is only JavaScript. Poly-distributables can be made by having a simple JS bootstrap that routes to internal assets within a .distributable appropriately without enforcing it in the file format.
refack commented 7 years ago

@vkurchatkin thanks for the pointer. IMHO it's good all the dups listed here 👍

bmeck commented 7 years ago

This should also be considered in light of the Nvidia situation going on.

ljharb commented 7 years ago

@bmeck what nvidia situation?

bmeck commented 7 years ago

@ljharb http://blog.sec-consult.com/2017/04/application-whitelisting-application.html

refack commented 7 years ago

@bmeck beautiful 🤦 they build and sign it themselves... If I was microsoft I'd revoke their certificate.

pmq20 commented 7 years ago

Might be relevant: https://github.com/pmq20/node-compiler

refack commented 5 years ago

Put into https://github.com/nodejs/node/projects/13 backlog

prettydiff commented 5 years ago

I only just noticed this thread. A few years ago I wrote a dependency management tool that intentionally avoids a centralized repository. If its helpful you guys can take the concept as a starting place. It is completely distributed, avoids all concepts of a centralized repository, and produces a single zip file. It uses the OS's embedded zip tools to produce a zip file. I was using SHA512 hashes to identify packages by version before Yarn was published with a similar concept.

https://github.com/prettydiff/biddle

I stopped working on this because there was 0 interest at the time. I was encouraged to give the idea and resume work on other projects.