microsoft / CCF

Confidential Consortium Framework
https://microsoft.github.io/CCF/
Apache License 2.0
777 stars 211 forks source link

Design notes: V8 #3298

Closed letmaik closed 2 years ago

letmaik commented 2 years ago

This is an issue for keeping a log of design notes for integrating V8 into CCF. New notes after the initial comment will become new comments. Everyone should feel free to comment along the way.

What is this about?

Integrating V8 into CCF, initially as additional app like js_generic, eventually integrated into the core.

Why V8?

Battle-tested (browsers, server runtimes) and feature-rich (e.g. Wasm support).

Is V8 faster than QuickJS?

According to https://bellard.org/quickjs/bench.html, yes. This is expected since QuickJS does not have a JIT compiler. Our own experiments within virtual mode CCF in https://github.com/microsoft/CCF/pull/3258/ show that V8 is slower at the moment. This is likely related to the fact that typical benchmarks (edit: like in https://bellard.org/quickjs/bench.html) re-use both a single JS context and the loaded scripts/modules. In those conditions, the JIT kicks in, especially when the benchmark doesn't just run for a few microseconds. In CCF each request gets a fresh JS context for security reasons, which means there is overhead in context creation and also in re-loading/evaluating modules. The overheads especially from context creation seem to be lower in QuickJS, making it faster overall. Even if context creation was equally fast in V8 though, we wouldn't benefit from the JIT since jitted code is not re-used across contexts. We have some ideas on safely re-using a context in V8 which would avoid the performance issues described above. This will be explored in the near future.

Does it work in SGX / OE?

Probably, but from experiments we know V8 requires some patches (so far: cpuid, time, semaphores, mmap, dynamic queries for page size) to deal with limitations of OE. So far, we have not completed this work and OE is not supported yet.

Can it run in virtual mode for now?

Yes, virtual mode is working and has been implemented as an app like js_generic (QuickJS), currently called js_v8.

Does governance run through V8?

Not yet. This would be premature and not possible generally given that OE support is not working yet.

Can apps still be run with js_generic?

Yes, js_v8 is strictly experimental and optional at this point. js_v8 will likely be included in CCF releases as additional (virtual) enclave image with a disclaimer that it is for experimentation only.

Should V8 eventually replace QuickJS?

Yes. Maintaining two JS runtimes as (build/runtime) options requires too much maintenance effort.

V8 is huge, does this make build times long?

No, V8 will be pre-built and uploaded to storage in a separate CI job that is manually triggered, typically when updating to new V8 versions, but also when changing build flags etc. of the same version. Developers can then download pre-built static archives and headers from storage both in release and debug variants. Local re-builds are also possible and separate from CCF (but through a script in the CCF repo).

Is V8 built with the same toolchain as CCF?

Yes, the same compilers and standard libraries are used. The V8 version we use (9.4.146.*) assumes Clang 14. To use Clang 10 (the version CCF currently targets) a minimal compiler wrapper script has been created that strips some unsupported flags like new compiler warning flags.

Is V8 always enabled in a CCF build?

No, there will be a new CMake flag ENABLE_V8. Eventually this flag will be removed.

Does using V8 mean there are more JS APIs available out of the box?

No. V8 like QuickJS provides a standard ECMAScript environment without additional APIs like the Web Crypto API for example.

Does V8 make implementing standard JS Web APIs easier/possible?

Yes, for two reasons: 1. there is a higher chance CCF can re-use code from other V8-consuming projects. 2. Supporting async is trivial (and has been implemented already) and means implementing APIs like the Web Crypto API which rely heavily on that feature becomes possible.

Is developing bindings for V8 different than in QuickJS?

Yes, there is generally more boiler-plate but also a different model based on templates (which improve performance). In my own opinion, using (object) templates consistently and separating each in a separate file makes binding development extremely structured and simple, even though slightly more verbose. It also makes understanding things in isolation easier as each template defines what it wraps (native pointers) and is responsible for only its own object. Any nested objects are modelled as separate templates.

Will every JS test in the test suite of CCF be run against js_v8?

Ideally yes, but the current test infra doesn't easily allow this kind of parameterization. For now, selected tests are manually enabled for V8: e2e_logging and the logging perf test. The logging app covers a wide range of the JS API and should be sufficient for now to detect breakages.

Any app-specific features not implemented yet?

Yes, ccf.rpc, ccf.host, ccf.crypto, and a few other bindings are missing. See https://github.com/microsoft/CCF/issues/3312.

Any other things differing from QuickJS?

Yes, runtime constraints (max stack and heap size) are not enforced yet.

What's next?

Merging an initial version of https://github.com/microsoft/CCF/pull/3258/ minus any performance related work.

achamayou commented 2 years ago

@letmaik thank you for the excellent design notes.

In those conditions, the JIT kicks in, especially when the benchmark doesn't just run for a few microseconds.

I think you mean doesn't.

On the subject of Open Enclave compatibility, I suggest the following principles:

letmaik commented 2 years ago

In those conditions, the JIT kicks in, especially when the benchmark doesn't just run for a few microseconds.

I think you mean doesn't.

No, just badly phrased. I meant that the Bellard benchmarks (like other non-CCF benchmarks) use the JIT because the context is re-used.

On the subject of Open Enclave compatibility, I suggest the following principles:

  • I see no obstacle to offering an alternative application runtime that does not target all platforms (ie. does not work on SGX).
  • If this happens, it must be in addition to quickjs in virtual mode, which remains a useful debugging facility for quickjs on SGX.
  • Core usage of the javascript runtime (ie. governance) must obviously work on all platforms, and there must be only one implementation of that. This can only adopt V8 once V8 works on all platforms that CCF active releases support.

Yep, I agree. I would still declare it experimental though for now.

jumaffre commented 2 years ago

In those conditions, the JIT kicks in, especially when the benchmark doesn't just run for a few microseconds.

I think you mean doesn't.

No, just badly phrased. I meant that the Bellard benchmarks (like other non-CCF benchmarks) use the JIT because the context is re-used.

There's also a "v8 - jitless" column in the Bellard benchmarks too, which show numbers in roughly the same order of magnitude as QuickJS. @letmaik Does your comment still hold against this benchmark?

letmaik commented 2 years ago

In those conditions, the JIT kicks in, especially when the benchmark doesn't just run for a few microseconds.

I think you mean doesn't.

No, just badly phrased. I meant that the Bellard benchmarks (like other non-CCF benchmarks) use the JIT because the context is re-used.

There's also a "v8 - jitless" column in the Bellard benchmarks too, which show numbers in roughly the same order of magnitude as QuickJS. @letmaik Does your comment still hold against this benchmark?

I was not referring to the jitless column, but V8 in CCF currently behaves like jitless with similar performance. And that's because of the reasons I outlined.

letmaik commented 2 years ago

@wintersteiger managed to fix the remaining OE issues, which means V8 runs in SGX as well now, single-threaded.

One (obvious) observation from running single-threaded is that there are no background jobs, meaning that V8 garbage collection runs synchronously as part of handling an rpc request. To be exact, during v8::platform::PumpMessageLoop(). Some requests therefore will suffer from a spike and this should likely be measured in benchmarks (some latency histogram etc.). Note that this is only relevant because isolates are re-used across requests, otherwise everything would be torn down anyway per request (and likely not leading to GCs but higher average latencies).

achamayou commented 2 years ago

Since we have not found a way to shorten context creation time, and match the combined latency-isolation offered by quickjs, we will not move forward with v8 as a fully supported backend.

We are working actively on allowing business logic to run in external containers, connected to the CCF core over gRPC. This will provide greater isolation guarantees, and allow arbitrary languages, runtimes and dependencies to be used.