oracle / truffleruby

A high performance implementation of the Ruby programming language, built on GraalVM.
https://www.graalvm.org/ruby/
Other
3.02k stars 185 forks source link

ExecJS backend based on GraalJS #2169

Closed eregon closed 2 years ago

eregon commented 3 years ago

It would be nice to have a Graal.js backend for https://github.com/rails/execjs. And that could be useful for https://github.com/oracle/truffleruby/issues/1827.

This is a clear use case for polyglot, and it also avoids needing to use slow external backends, or V8-based backends which seem much heavier and harder to support. Using Graal.js means running TruffleRuby in --jvm --polyglot mode (available as truffleruby-graalvm/truffleruby+graalvm for Ruby installers).

I got an initial proof of concept in https://github.com/rails/execjs/compare/master...eregon:graaljs-runtime which passes some tests.

Internal issue: GR-27621

chrisseaton commented 3 years ago

Yeah would be great. There was a toy shim in master a long time ago?

eregon commented 3 years ago

There was a prototype specific to MiniRacer by @aardvark179 a long time ago (internal PR 75).

Current approach uses Polyglot.eval() for simplicity but as the TODO says we should use inner contexts or something similar for isolation.

eregon commented 3 years ago

All 198 tests actually pass now with the updated branch and fixes which have been merged in Graal.js and TruffleRuby!

Still need to actually isolate the ExecJS contexts for correctness and to not need --single-threaded.

eregon commented 3 years ago

I forgot to post an update here, but I now have an ExecJS backend using Graal.js, with proper isolation and passing all tests: https://github.com/rails/execjs/compare/master...eregon:graaljs-runtime-inner-context-serialize

There is a performance issue though that a RubyContext is initialized for each inner context/ExecJS::Context which is unnecessary and takes some time. That probably needs some new Truffle API to fix it.

SamSaffron commented 3 years ago

One concern with only doing exec js is that we would miss out on the richness a MiniRacer type interface offers.

context = MiniRacer::Context.new
context.attach("math.adder", proc{|a,b| a+b})
puts context.eval 'math.adder(20,22)'
# => 42

This is quite critical for quite a few tight integrations of js/ruby. Full shims for every object is overkill, but the simple ability to mount a function call internal to the JS engine is quite important.

Any thoughts on targeting a partial MiniRacer API, specifically function mounting?

eregon commented 3 years ago

Is that used in Discourse? I think we'd need some kind of MiniRacer backend to support that. I started with ExecJS because it's simpler essentially.

The semantics seem interesting here: the JS context is isolated, but the Ruby Proc is not? I guess the arguments and the return value are serialized between Ruby and JS? (otherwise the JS contexts are not truly isolated anymore)

If there was no isolation that pattern would be trivial to support on GraalVM and we wouldn't need any serialization, but I guess isolation between contexts is a key feature of ExecJS/MiniRacer, right? It also enables running separate JS contexts in parallel which would otherwise not be possible due to JS "no shared state parallelism" semantics.

SamSaffron commented 3 years ago

Yes we use MiniRacer directly in Discourse specifically due to the attach api.

When cooking Markdown for example we can call into MRI Discourse to figure out if a user that was @mentioned is really a user or not.

There are certainly ways to avoid needing attach (use HTTP calls from JS back into Ruby for example) but it would get very complicated.

We isolate as much as we can on all boundaries, we only move serialized copies out of the JS VM into the Ruby VM and vice-versa.

As you mentioned the isolation allows for extra parallelism, I would not say it is a "must have" though for a GraalVM implementation which would not have the same concerns. You have no GIL and get to share the same GC implementation between the JS and Ruby.

The super tight boundaries in MiniRacer are only there due to the fact we are running MRI/V8 concurrently and can not afford to have cross GC deadlocks and so on.

Sounds to me like the truffle implementation here could be different / simpler while retaining extreme safety given you share the runtime.

eregon commented 3 years ago

No isolation would mean global variables in JS are shared between all MiniRacer::Context though, wouldn't that be an issue? (e.g., we found that 1 ExecJS test breaks because of that). OTOH, it would mean the framework/library JS code could be loaded only once and then naturally reused.

It would be easy to make a prototype for that. However, Graal.js unfortunately does not support being executed in parallel in the same context, because JavaScript as a language is fairly explicit that no shared-memory parallelism is allowed (in the same context/isolate/global state). Currently the check for that is too strict and actually makes the first Ruby Thread.new fail. That might be relaxed, but we'd still only be able to call to Graal.js sequentially (e.g., with a Mutex around), which sounds too big a limitation.

I think the only realistic way here is to use inner contexts, then we should be able to be much more compatible with the standard MiniRacer. It's some work but I think we already have all the pieces needed for it.

SamSaffron commented 3 years ago

Oh my ... I see ... what a pickle. Yes, we allow (and often use) multiple isolated contexts from one process.

One (somewhat ugly) option is to spawn off a process per context and use a named pipe to issue commands (serializing back calls to ruby, etc). It would make Context creation a rather expensive operation, but actual calls could be reasonably fast.

How do you envisage inner contexts working ?

eregon commented 3 years ago

There is great progress here due to improvements to inner contexts in Truffle (upcoming PR, but seems almost ready). With that it's convenient and efficient to use inner contexts for ExecJS: https://github.com/rails/execjs/compare/master...eregon:graaljs-runtime-inner-context-convert ExecJS tests run in 3.3s, vs 22s when there was the issue to initialize a RubyContext in each in inner context, vs 2s with no isolation at all (that's incorrect of course). For comparison tests when shelling out to node take 24s.

These improvements add TruffleContext#eval and automatically wrap the result or exception in a way that all InteropLibrary messages are forwarded to the context that object belongs to. That makes it possible to e.g., access a JS object, call methods on it, etc from the outer context conveniently (and not require to serialize everything to JSON in JS and deserialize in Ruby).

I think we could support the MiniRacer attach API easily with this new capability, probably just by context.eval("(function(value) { math.adder = value })").call(proc{|a,b| a+b}) in the inner context which will automatically wrap the Ruby Proc so the Ruby Proc is actually executed in the outer context (the only correct context to execute it).

SamSaffron commented 3 years ago

I think we could support the MiniRacer attach API easily with this new capability,

Amazing! Would love to try this out!

eregon commented 2 years ago

There is a Graal.js backend in execjs since 22.2 or earlier, and it works well: https://github.com/rails/execjs/blob/master/lib/execjs/graaljs_runtime.rb It is automatically selected if running with TRUFFLERUBYOPT="--jvm --polyglot", TruffleRuby+GraalVM and js is installed (gu install js). If any of these parts is missing it warns if Ruby is in $VERBOSE=true mode (or -w).