nucleus-js / design

This repo is for the core design, discussion, spec, and tests for nucleus implementations.
Other
110 stars 20 forks source link

Design nucleus. #1

Closed creationix closed 8 years ago

creationix commented 8 years ago

The basic goal of this project is to implement a tiny core runtime that contains libuv, javascript and some essential C libraries (like openssl) needed to re-implement node.js in userland as modules.

If possible this will be backend agnostic and allow multiple JS engines.

See also https://github.com/nodejs/node/issues/7098

creationix commented 8 years ago

I think a first step would be to define an interface which all backend implementations should adhere to. One of the goals here is to avoid C/C++ addons so we don't need to worry about a public facing C or C++ API for addons initially.

Since it's a royal pain to try and get all JS engines to conform to a least-common-denominator interface, let's instead have independent implementations for each engine that all match some JS interface spec. Then modules written for one runtime will work for all so long as they don't use language features unique to a particular runtime (V8 and Chakra for example have most/all of ES6 while duktape is mostly ES5, but has lua style coroutines).

Once he have a common interface, we can start implementing the C parts for the various runtimes. I humbly suggest the I/O parts be directly designed to match libuv.

Fishrock123 commented 8 years ago

I think this will also be greatly simplified if we use things such as @mscdex's work to make the dns resolver be pure-js and not use c-ares https://github.com/nodejs/node/pull/1843, and the http-parser: https://github.com/nodejs/node/pull/1457

Fishrock123 commented 8 years ago

I think a first step would be to define an interface which all backend implementations should adhere to.

Sounds like @trevnorris's original API WG goals.

trevnorris commented 8 years ago

:-) I've had a long-standing goal to create an API for node that serves as a strict entry point into C++, which basically all code in lib/ would use. Alas, time constraints.

mscdex commented 8 years ago

As I mentioned in the linked DNS PR, it is difficult to get even close to matching the performance of c-ares/libc, even when using node's C++ UDP bindings directly. That pretty much rules out any performance issues in js land, so the C++ layer would have to be improved (if possible) to be able to compete with c-ares and/or the system resolver.

Regarding http, I haven't compared benchmarks since @indutny incorporated the JS stream stuff that bypasses js land when doing http parsing, so I'm not sure how the pure js http parser fares anymore.

creationix commented 8 years ago

As far as DNS resolving, in luvit we have two paths. One is pure lua on top of libuv's UDP primitives and is used for advanced queries. For basic resolving domain names to ip addresses, we use libuv's getaddrname and getaddrinfo which uses the system library on a thread-pool I believe. We have had no performance issues with this. Both are pure script on top of what libuv provides natively which will be provided in the the C core.

creationix commented 8 years ago

Let's not get too tripped up on edge performance issues. The goal here isn't to win synthetic benchmarks with the vanilla flavor of the minimal core. We will have options where people can build different flavors of the core with various libraries included (like openssl, cares, http_parser, etc). If you're deploying a large enough system where these performance issues are actually a problem, then you don't mind compiling a little C code. But for most projects and development workflows, this is not critical.

In luvi, there are two main flavors known as "tiny" and "regular" with the biggest difference being that regular includes openssl and a couple lesser used C addons in it's core. For many cases, http servers don't need openssl since they are running behind a reverse proxy anyway that handles the TLS termination. Things like MD5, SHA1, etc can usually be handled just fine (and sometimes even faster) in pure script.

indutny commented 8 years ago

Wow, I really like this proposal. I was working on something similar recently:

It is a modular C stream implementation. Not sure how useful it is, but it could be a good enough interface for interactions between C addons.

creationix commented 8 years ago

@indutny I saw those. I've always said libuv should have an extension community where things are written in C and can be used by all runtimes that consume libuv. Those could be included as well in the core if they are tiny (which I expect) and in optional addons if not.

If they are only to be consumed by other C code and it doesn't make sense to expose them to JS, that's fine. It will still be useful for addons to core that can use them.

creationix commented 8 years ago

I started a new issue for designing the libuv -> js mapping interface that all implementations must adhere to. https://github.com/creationix/nucleus/issues/2

indutny commented 8 years ago

uv_link_t by itself is very small. uv_ssl_t is a bit bigger.

creationix commented 8 years ago

So I think we could say a nucleus implementation contains:

  1. A JS runtime engine
  2. libuv
  3. Bindings to libuv for said engine (exposing the standard interface)
  4. Glue to make applications.
  5. Other optional C modules.

I think for part 4, we should follow the pattern in luvi. This means including minimal code for reading zip files. I have a modified version of miniz that I've bugfixed and added missing features that works great for this and is super tiny.

This will expose a bundle API that allows scripts to read in the virtual filesystem that can either be a zip file (standalone or appended to nicleus) or a folder on disk.

We will also have some minimal hooks that makes bootstrapping a require system in userland less painful. For example, it can look for a file bundle:deps/require.js or something and auto-run it if it exists before running bundle:main.js.

mmicko commented 8 years ago

Do not wish to disappoint you but there is already something similar in works https://github.com/saghul/sjs

Have to say I am interested in this kind of projects, since I think those can replace LUA with JavaScript (that more people are common with) to be used for scripting their software. Also there is large library of nodejs compatible javscript modules, making it possible to be used from application itself would be a great plus.

My suggestion would be to to go for C++11/14 support, and not just plain C. Exposing API and enabling user to expose their classes into JavaScript is very useful. There is LuaBridge project done for LUA that enables you to expose your classes and objects to a LUA engine. Doing similar out-of-box solution would make integration with user code even easier.

Note that Lua and duktape are quite similar in design so similar patterns can be used.

If you go this or similar way "you have my axe" :)

dlmanning commented 8 years ago

Am also super into this.

Plain C makes for easier interop with whatever other language one might be interested in calling from, (e.g. rust).

creationix commented 8 years ago

@mmicko I'm not disappointed, I know about sjs and even linked to it in the parent conversation in the nodejs issue. From my initial browsing however, sjs is much higher-level and opinionated than this project is aiming to accomplish.

Fishrock123 commented 8 years ago

Glue to make applications.

@creationix We'll probably a good amount of process, and some sort of module... bootstrapping at least. (Or maybe we just use ES modules?)

Is that what you meant by "glue"?

mmicko commented 8 years ago

@creationix good to hear that

@dlmanning understand that C API is easiest to combine with other languages, just pointing that C++11/14 support would be quite welcome

creationix commented 8 years ago

@mmicko Also since we're fixing the interop level at the JS interface exposed by the C/C++ backend we don't need to standardize on a language/version. The duktape backend might be all C89 while the V8 backend will obviously have some C++ involved. The common glue layer can even have multiple implementations if needed as long as the JS interface matches the spec. This is why it's important to define the interface clearly.

Fishrock123 commented 8 years ago

Note: using just ES modules are quite incompatible to the current node ecosystem so we'd still have to have some module bootstrapping available for the module module I think. (& It would probably still have to be passed to scripts implicitly, like require. ...So it would probably have to be apart of the nucleus, I think.)

creationix commented 8 years ago

I don't want the module system to be part of the core glue. All we need is some conventions for bootstrapping a module system on choice. I really don't want things like node's global process in this layer.

For the curious, you can see how luvit accomplished this. Both process and require are userspace in modules.

creationix commented 8 years ago

@Fishrock123 I envision two parts.

  1. The core API will provide things like loading files by path, scanning directories, getting cwd, getting environment variables, getting path to main binary.

    It would also expose the JS runtime with API functions for compiling strings into code (with filename and ES goal type)

  2. The hook will simply auto-run a file with a certain filename so that it can self-register before the main file is run.

Would this not be enough? What APIs exactly would need to be provided for a module system to be implemented?

For luvit's require which is modeled after node's I basically needed:

creationix commented 8 years ago

@Fishrock123 I think the simplest way to expose the builtin C modules without depending on a module system is to have some global object (like global.NUCLEUS) that exposes the various builtin modules. Userspace module systems could then expose a uniform interface where require('uv') simple returns global.NUCLEUS.uv, but require('some-other') is handled by the custom loader.

domenic commented 8 years ago

You could even call it process.binding :trollface:

creationix commented 8 years ago

@domenic As I told @Fishrock123 in IRC, I'd like to avoid any name clashes with anything existing in node so I don't have to worry about matching semantics. This layer needs to have as little opinion as possible.

creationix commented 8 years ago

Also, process.binding will go away if this ever lands in core. And it will assuredly have a different shape.

creationix commented 8 years ago

@Fishrock123 I wrote up the beginnings of a README with the parts that are currently designed. This should help solidify the design goals a little.

creationix commented 8 years ago

@dlmanning see #3

dlmanning commented 8 years ago

@creationix : I am not as funny as I think I am...

drom commented 8 years ago

@creationix It woulde nice if nucleus would be available as an library for C++ embedding. I have used jxcore for this purpose: https://github.com/jxcore/jxcore/blob/master/doc/native/Embedding_Basics.md and quite liked it. But it is not supported anymore ;(

creationix commented 8 years ago

@drom I'm not sure there would be much in here apart from what's provided in the JS engines and the bindings. I'll try to make the various bindings independent enough that they could be used embedded in other projects.

chrisdickinson commented 8 years ago

Hi! I'm poking at something along the same lines over here. It builds and runs on linux (ubuntu trusty) and OSX thus far, and glues v8 to libuv & uv_link_t using gn.

It currently leans on a hacked-up version of chromium's build/ dir, which I'm tearing apart to get to the salient bits. The idea is to get it running on windows, osx, and linux first, then rewrite the build dir's gn stuff in a cleaner way to get to that end.

The experiment is thus:

  1. Get a minimal project that includes v8, libuv, and the various uv bits @indutny has been putting together building everywhere.
  2. At that point build in & expose fs, tcp, and tls bindings and a module system (via require) to js.
    • I might do this in a separate project using gclient & gn to pull in the minimal binding layer.
  3. Whenever a node global (process) is accessed, or a node builtin module is required require('fs'), short circuit the lookup to require('@nojs/node-<target>').
  4. Long term goal is to get npm install working and bundle npm with the project.

My (handwave-y) plans are — and you'll each probably find something you like and something you dislike here:

In other words: I think this project and nojs are probably going to be walking along the same path for a bit, though it seems like eventually we'll have different goals. I'm happy to share the build code I've hacked together. Maybe making it easier to grab a compilable, working copy of libuv+v8 & friends will let a thousand Nodes bloom.

indutny commented 8 years ago

@chrisdickinson looks very cool! Though, you probably would like to use jit.js instead of heap.js, since the latter one is a JS VM Heap implementation...

creationix commented 8 years ago

@chrisdickinson thanks for the feedback. Indeed our goals are slightly different. Also I'll be starting with duktape and jerryscript as sample imeplementations of this interface as I abhor C++ and that steers me away from V8. Once I have things stable it would be awesome to use your code to make a V8 implementation.

Also the scope of this project seems to be a bit slimmer. I won't have any opinions at all regarding streams, promises, etc. I just want to provide a common base for tools to be built.

chrisdickinson commented 8 years ago

@indutny Ah indeed! I was thinking about repurposing this code to do the hop from JS to compiled code.

@creationix Cool — I wish you the best of luck! I'd definitely encourage checking out gn as a metabuild tool, it's slightly opaque but is pretty slick after a bit of use. I'm collecting a list of possibly handy links on the process of gluing stuff together.

indutny commented 8 years ago

@chrisdickinson https://github.com/js-js/jit.js/blob/master/src/jit.cc#L56-L96 ;)

dominictarr commented 8 years ago

I am certainly of the opinion that @creationix's opinionlite approach is the way to go. Streams should definitely not be in the "core", way to many opinions in streams. even we have @creationix's min-streams and my pull-streams because we couldn't agree on one thing and they are incredibly simple!

I think a project like this is really a C project, it looks like it's about javascript but it's not. It's about finding a way for C libraries to easily plug into a thing, it seems to involve javascript, but would that even be necessary?

There are totally ligitimate reasons not to include certain C libraries (personally, I'd like be able to exclude openssl, and build in libsodium instead - This would be ideal for secure decentralization projects) clearly there is also different JS engines that target different use cases (jerryscript is low resource use vs v8 is performance)

I think that means that the particular C libraries used need to be lightly coupled, I just need to pull them in by editing a config (or package.json)

@drom's point about embedding as a library would be super valuabe too - that would make this easy to deploy as an android app - just write a java binding to it and then embed directly into the same process.

dominictarr commented 8 years ago

but @chrisdickinson I think you are right about FFI. It's too hard to write a node binding, if you could just call a C function from "javascript" then we are done. Is that what you are thinking here?

dominictarr commented 8 years ago

even if I have to put the args I am calling into a buffer, that is still easier than the current way to write node bindings.

dominictarr commented 8 years ago

I should also point out that you don't actually need a module system. If you can run one javascript file, then you can statically link the javascript. i.e. with browserify, or noderify (which is assembled from browserify parts to make node.js scripts start really fast)

creationix commented 8 years ago

Initial core API is documented in the README and I just prototyped a duktape version (minus libuv and zip reading) that you can see in action.

See it in action https://asciinema.org/a/b0yk23l05yhrw9mlp0uqik6pp

creationix commented 8 years ago

@dominictarr while it's true you don't need a module system, I do love a workflow that doesn't have build steps. As I demonstrated in the asciicast, you can run apps directly out of the source tree while developing without needing to rebuild the final binary. If the JS needs to go through a build step it breaks this simple workflow.

dlmanning commented 8 years ago

Given that JS now has a module system in its specification, it would seem strange to not build it in, no?

dominictarr commented 8 years ago

@dlmanning sure, if you are using a javascript engine that implements modules, then you could have that. The engines that @creationix is talking about starting with jerry-script and duktape both implement ES5.1

dlmanning commented 8 years ago

@dominictarr sorry, I missed the bit about starting with JerryScript

chrisdickinson commented 8 years ago

@dominictarr:

but @chrisdickinson I think you are right about FFI. It's too hard to write a node binding, if you could just call a C function from "javascript" then we are done. Is that what you are thinking here?

Yep!

@dlmanning: Notably, the module system is only ~sorta implemented in stable V8's as well (flagged and, IIRC, incomplete.)

dlmanning commented 8 years ago

@chrisdickinson : sure, it's a work in progress, but it's in progress.

(Don't worry, I have no desire to turn this thread into another ES Modules debate)

trevnorris commented 8 years ago

One the side about import. It's not possible to resolve a path at runtime. Which makes development of native modules a little more painful when you simply want to run:

$ NODE_DEBUG=1 ./node_g /path/to/my/module

and have it automatically pick up the Debug build of the binary. Setting up the application in this way, I'd assume there would be more than a few native modules written to extend the basic functionality.

dlmanning commented 8 years ago

@trevnorris : Seems like it would be good to provided a separate means of deliberately loading dynamically?

matthewp commented 8 years ago

Good choice on splitting the module system into user-land. I agree with both @creationix here that having one is good for development and with @dominictarr that they aren't needed for production. Is main.js as an entry-point going to be configurable? I'd like to have a separate dev.js and prod.js so I can do both.

This is going to be amazing for transpile-to-js languages, you essentially get statically linked small(ish) binaries for free if you just choose JS as your target.

creationix commented 8 years ago

@matthewp luvi has an option to override the entry point, but it's tricky designing the CLI without resorting to environment variables that can cause security vulnerabilities.

That said, you can have a main.js that loads a real main of you choice based on some env or argument.