querycert / qcert

Compilation and Verification of Data-Centric Languages
https://querycert.github.io/
Apache License 2.0
56 stars 9 forks source link

GSoC 2020: Webassembly backend #147

Closed pkel closed 3 years ago

pkel commented 4 years ago

I worked on translating Ergo smart contracts to Webassembly (Wasm) as part of Google Summer of Code (GSoC) 2020. You can find an overview of my project in PR #777 on the Ergo repo. TL;DR: Ergo adds a smart contract language frontend to the Q*cert compiler. I extend Q*cert with a proof-of-concept Webassembly backend.

This PR constitutes my GSoC project submission. It's meant to be frozen on the GSoC deadline on 8/31/2020. Future edits should go to Jerome's working PR #142.

In the following, I will try to wrap up some technical aspects of my implementation.

Overview

The Wasm specification contains a reference implementation in OCaml. I implement a translation from Imp(Ejson) to the Wasm reference Ast. We also hook up the reference interpreter as Q*Cert evaluation function for testing. All my code is OCaml and not verified. Jerome did some plumbing on the Coq side.

Preliminaries

Code generation and evaluation

Code generation and evaluation is implemented in /compiler/wasm/. The main component is the functor in wasm_backend.ml which is instantiated in wasm_ast.ml. The latter provides all functions necessary to extend Q*cert with a new Wasm_ast language target. The functorization is needed since Ergo's extracts its own Imp and OCaml cannot infer their equality.

The translation functions in wasm_backend.ml rely on an incomplete intermediate representation for Wasm (wasm_ir.mli). In Wasm, functions, types, and variables are addressed by their integer index within the module. The IR replaces these integers with OCaml variables for more convenient addressing. It also provides concise constructors for Wasm AST elements.

Another convenience module is wasm_util.ml. It contains a small lookup table implementation based on Hashtable which is used all over the place to get from OCaml values to integer indexes.

In contrast to Imp's block-scoped variables, Wasm has only function-scoped variables. The translation in wasm_backend.ml does not take this into account and relies on unique variable names. We pre-process the Imp using the translation in wasm_imp_scoping.ml in order to resolve this conflict (see #145).

I implement a binary encoding for EJson values in wasm_binary_ejson.ml. This encoding is used for storing constants in the linear memory of the generated Wasm module. It's also used at runtime for communication between the interpreter and the runtime module (see Ergo PR #777). I briefly describe the encoding in the header of source file.

Runtime

The generated wasm module relies on a Wasm runtime for execution (see Ergo PR #777). This runtime is implemented in Assemblyscript (NPM ecosystem) and lives in /runtimes/assemblyscript/. We build the runtime with npm run asbuild:untouched and run some tests with npm test. The Assemblyscript source resides in assembly/index.ts, the compiled Wasm in build/untouched.wasm.

We add a dune rule that encodes the binary runtime build/untouched.wasm into an wasm_runtime.ml file and bundles it into wasm_backend.ml.

Tests

We provide some rudimentary tests under /tests/imp_ejson/ which can be run all at once using the command make -C tests wasm-imp-spec. The tests evaluate pieces of Imp(Ejson) on a given input in order to produce the expected output. Then the Imp code gets translated to Wasm and executed with the same input on the Wasm reference interpreter. The resulting output is compared with the expected output (should be equal).

Issues

pkel commented 3 years ago

Some other pages link to this PR as my GSoC project report. Let's keep it as is. We can track progress in #153.