riscv / riscv-j-extension

Working Draft of the RISC-V J Extension Specification
https://jira.riscv.org/browse/RVG-128
Creative Commons Attribution 4.0 International
159 stars 18 forks source link

Feasibility study: using pointer masking for V8 Javascript Engine #7

Closed penguinwu closed 5 months ago

penguinwu commented 3 years ago

why consider V8 as a potential use-case?

One target scenario of pointer masking is to provide hardened isolation guarantees within a process by the hardware. V8 provides lightweight isolation within a process, i.e., a Javascript code running in a V8 process conceptually cannot share data with another JS code running in the same V8 process. On the face of it, there seems to be a potential to replace or augment the software-based V8 isolation mechanism by some hardware support. This issue explores the use case in-depth.

Questions to be answered

The fact that V8 has already a software-based isolation mechanism actually means that the HW solution has to overcome additional hurdles to prove its benefits. It has to demonstrate that:

A primer on V8 isolation mechanism

You can view V8 as a multi-tenant runtime system that can run multiple JS codes in a single V8 process. V8 includes a common runtime (e.g., to initialize the VM and loading all the pre-compiled build-in functions), a JIT compiler (called Turbofan), an interpreter (called Ignition Engine). At runtime, each isolate is initialized by maintaining its own heap and code cache (called Isolate cache).

The implementation of V8 is quite disciplined and ensures that data allocated by one isolate will never be passed to another isolate inside V8. More importantly, since JS does not support unsafe operations such as pointer arithmetic or unsafe type casting, there is no easy way for user codes to hack the lightweight isolation provided by the runtime. This is how V8 is able to provide lightweight isolation by software.

Under the hood, however, certain sharing is still happening at the V8 level to make things run efficiently and save memory footprint (see this blog on how to replace per-isolate built-in functions to be shared across isolates). For instance, all V8 isolates share the same compilation queue/compilation threads, GC threads, loaded pre-compiled built-in functions, the on-disk code cache. Because of such high-level sharing, certain top-level data structures could also be shared across isolates. It would take a V8 expert to figure out all the places where such sharing happens.

We have a rough model of memory access patterns in common utility threads (e.g., JIT threads or GC threads) and in isolate thread.

Feasibility analysis

The fact that pointer masking applies to all load/store instructions can impose a major constraint on the software implementation. It means that once a thread (say a thread executing an isolate) chooses to use pointer masking, then all load and store instructions executed by this thread are limited to access the range of effective addresses imposed by the pointer masking.

In the case of V8, it is clear that common utility threads should not use pointer masking because they may access memory regions from other isolates, there may not be a suitable range to set the pointer masks with.

For isolate threads, we hope to be able to use pointer masking to limit the access range to that of its own heap range. This is only feasible if V8 guarantees that an isolate thread would never access memory regions outside its own isolate memory. While in principle an isolate thread is mainly accessing its own heap, It is not clear to me, under which conditions, it would never access memory outside its own heap. Without such guarantee, using pointer masking may not be sound.

Benefit analysis

Currently V8 lightweight isolation is implemented by maintaining separate per-isolate memory regions and judiciously prohibiting sharing across isolates. There is no obvious additional runtime overhead in such an implementation. So I don't think a HW solution has a performance benefit.

On the front of security hardening, the benefit is less obvious for a managed language runtime like V8 that already has an isolation mechanism in place. Again this would require a true V8 insider to investigate.

Preliminary conclusion

Based on the analysis above, my current assessment is that V8 may not be a good use case for pointer masking because it already has a SW-based isolation mechanism that incurs little additional overhead; and that pointer-masking's subjecting all all loads/stores in a thread to a masked range may be overly restrictive to the access patterns of V8 threads.

I think pointer masking may be more suitable to use cases where there is no feasible SW solutions to guarantee isolation (i.e., for unsafe languages like C/C++).

ghost commented 3 years ago

It is still unclear to me if there is any shared data amongst isolates. It is clear that there is shared code, but shared data seems unlikely. Perhaps we can run some experiments to track data access patterns. This should be relatively easy to do in the simulator.

ghost commented 3 years ago

Accesses to data from the shared builtin code should all go through the root register, which points to a location in each isolate's heap.

To address both issues, we introduced an indirection through a dedicated, so-called root register, which holds a pointer into a known location within the current Isolate.

Metadata is also stored in each isolate's heap:

To preserve their metadata, each embedded builtin also has a small associated Code object on the managed heap, called the off-heap trampoline. Metadata is stored on the trampoline as for standard Code objects, while the inlined instruction stream simply contains a short sequence which loads the address of the embedded instructions and jumps there.

penguinwu commented 3 years ago

It is still unclear to me if there is any shared data amongst isolates. It is clear that there is shared code, but shared data seems unlikely. Perhaps we can run some experiments to track data access patterns. This should be relatively easy to do in the simulator.

@fw-brice I would be good to know for sure. Simulated build only executes jitted codes, what about native codes invoked via ecall by jitted codes ? A more thorough way to evaluate this is to get traces from QEMU for each isolate thread and then process its ranges.

penguinwu commented 3 years ago

Summary of team discussion on 11/16: