sampsyo / cs6120

advanced compilers
https://www.cs.cornell.edu/courses/cs6120/2023fa/
MIT License
709 stars 157 forks source link

Proposal: RISC-V Binary Translation to WebAssembly (WASM) #394

Open evanmwilliams opened 9 months ago

evanmwilliams commented 9 months ago

What will you do?

We will design and implement a binary translator that converts RISC-V binaries into WebAssembly (WASM) binaries. This will enable RISC-V software to be executed in any environment with a WASM runtime, including web browsers, servers, and certain embedded systems.

How will you do it?

First, we'll need to familiarize ourselves with the RISC-V and WASM specifications. The three of us are quite familiar with RISC-V already, but we are less familiar with WASM. We'll pick a suitable subset of RISC-V instructions that we can then translate into WASM. The initial subset will just ensure we can get out a proof of concept. We'll extend the subset as time allows to be more ambitious.

We'll need to develop a parser for RISC-V binaries. We've written interpreters for RISC-V before in other courses, so this should not be too involved. It might take a bit more work to design good data structures to store RISC-V instructions and their operands. Then, we'll create a mapping schema to transform each RISC-V instruction (or a series of instructions) into its corresponding Wasm representation. Some mappings may be direct, while others might require innovative solutions due to architectural differences.

As a more lofty objective, we'll also try to resolve some of the differences between the memory models of the two ISAs. We'll need to emulate RISC-V's memory model within Wasm's linear memory. To deal with system calls, we'll need to implement shims or stubs to emulate or handle RISC-V system calls within the Wasm environment. Our mapped instructions can then be used to produce WASM binaries.

If time permits, we will explore optimizations to improve the performance or reduce the size of the translated WASM binaries. Some of these optimizations can be more classical compilers techniques like dead code elimination or loop unrolling. Other more tailored optimizations might include profiling for hotspot optimization and tailoring optimizations for specific WASM runtimes.

As a sidenote, we're planning on doing all of this in Rust. Two of us are pretty experienced with Rust already, and the third is eager to learn. Seems fitting, given some of the conversations we've had in class recently!

How will you empirically measure success?

First we'll ensure functional correctness by developing a suite of RISC-V binaries that can be easily verified. Then, we'll do the translation to WASM and verify that the outputs are the same. This is the baseline of correctness. If we implement a more expansive subset of RISC-V, we might be able to test on actual benchmark suites that have been developed. The other way we can empirically measure success is by observing the performance. We expect that the WASM binaries are reasonably performant, for some definition of performant. This might show itself in the size of the generated code or the speed of the executed binary given a certain set of constraints. In our final report, we'll do a deep dive into at least one realistic program that is actually useful to translate into WASM.

Team members: @emwangs @he-andy @evanmwilliams

sampsyo commented 9 months ago

Sounds like fun, and also quite a lot of work! I'm glad there are 3 people on your team. :smiley:

One very big challenge you are going to encounter is that WebAssembly uses structured control flow. That is, it has constructs like if and while but not arbitrary branch/jump instructions. Of course, RISC-V binaries have arbitrary (possibly non-reducible) CFGs. This is a huge headache for many compilers that target WebAssembly.

There are a few ways to deal with this limitation, and you'll need to pick one. You can pick something simple and inefficient, such as a level of indirection through a big wrapper while. Or you could use the famous "Relooper" algorithm that appears in many wasm-related tools, or you can use this somewhat refined version by Ramsey published in 2022.

Other nits:

Can you please answer these important questions in the next ~week, before you get started on the project? Even provisional answers are OK, but it is important to have answers to these up front: