Open lightclient opened 5 years ago
This is great. A couple additions:
We should break a lot of this out even further into subtasks, etc. so we can begin assigning/finding people from the community to get involved here.
Also, "The biggest missing pieces to the runtime still are efficient bignum host functions and multi-proof dependent middleware for stateful execution. Although this won't be utilized by consensus nodes, it will be important for state providers and proof generation in general." - This assumes interpreters are truly the final choice/route. It would be nice to begin formalizing a decision here.
Great job @lightclient !
To expand more on the multi-proof section, I see two distinct areas of development.
The lowest level of the proof stack should be indifferent to the high level data structures it maintains. The design space for { efficient | compact | static | deterministic | etc }
proofs is rather large. There exists only keys K
and values V
. This is akin to the memory layout in the EVM where the only concept of storage is a mapping of u256
keys to u256
values. Higher levels of abstraction can emulate more complex types (see Solidity's layout of state variables).
The minimal interface a low-level proof database should provide EE / contract developers is as follows:
fn get(&self, value: V) -> Option<V>;
fn set(&mut self, key: K, value: V) -> Result<Option<V>>;
fn root(&mut self) -> Result<V>;
fn from_bytes(bytes: Vec<u8>) -> Multiproof;
fn from_map(map: HashMap<K, V>) -> Multiproof; // only used by generation frameworks
Note: this is should be used more for inspiration as it is a Rust-specific interface, and the true underlying interface would need to a) be language agnostic b) minimize memcpy
occurrences.
A critical attribute of the proof backend is it's size when serialized and the time and space complexity of traversing it in serialized form. Since the operations defined in the interface above will be executed within a metered virtual-machine where every operation counts, its critical that it is as performant as possible. For example, throwing a proof into a standard hash table would likely incur a lot extra computation we'd like to avoid (e.g. hashing the key to determine the location of the value).
Having a robust proof backend is great, but using it is not ergonomic from a developer's perspective. It is important that complex user objects can be expressed simply and mapped into a proof backend. It would be even greater, if the high-level wrapper was agnostic of the backend so they could be swapped out on a whim. In order to achieve this, some compile time processing needs to occur that translates user objects to a key-value memory layout that can be loaded into a multi-proof via the from_map
interface.
There are two conflicting use cases for the high-level wrapper: efficient computation within a metered virtual machine and simple proof generation on a local machine.
Run-time slow downs are often caused by unnecessary memcpy
calls occurring. This can generally be avoided by building the wrapper as an interface to the proof backend, rather than an actual object that stores values. An example of a wrapper in this case would be something that, at compile time, parameterizes the keys a structure would occupy in memory. Since this use case is only focused on on-chain execution, it as also assumed that the proof backend has been authenticated and that all values necessary to compute the transition are available (or can be calculated from a default value).
On the other end of the spectrum, the proof generator should have access to the high level constructs of the language the structures are defined (e.g. should act as an actual object instead of interface). This should make it simple to write generators which create the multi-proof that a transaction expects.
A valuable interface for these high level constructs is the following function:
fn to_map(&self) -> Map<K, V>
Things get more complex once the high level structures are used inside of other structures that implement the interface.
I'm starting to think about how to manage state in the Eth1 EE and a lot of what you wrote here would be relevant to that. It'd be great if we can find a "proof middleware" that could abstract the exact type of tree/proof format, and allows for efficient authentication, data lookup and state modification.
Just to make sure I've got the general idea, the goal is to have a library (let's say in rust) which EE and state provider developers would use for the tree/proof part of the logic. I'm mostly interested in the EE itself so I'll focus on that from now on.
This library has two APIs. It's clear what the low-level API is doing, but just to make sure I wanted to ask these methods are not host functions, right?
I'm not sure I got exactly what goes in the high-level API?
It's clear what the low-level API is doing, but just to make sure I wanted to ask these methods are not host functions, right?
Yep, you're totally correct.
I'm not sure I got exactly what goes in the high-level API?
I don't think I laid it out very clearly, so that's probably why it isn't making so much sense. I would imagine two "versions" for the same type:
An interface type that has only a pointer to the beginning of an Account
in memory and an interface of getter / setter functions. This way EE's can minimize their copying.
struct Account(*const u8);
impl Account {
fn balance(&self) -> &U256;
fn nonce(&self) -> &u64;
fn signature(&self) -> &[u8; 96]
}
A "stateful" type which can be used by proof generation tools to mock out values and convert them to proofs.
struct Account {
balance: U256,
nonce: u64,
signature: [u8; 96]
}
The interface I defined to convert from high-level objects to proofs is the to_map
interface which can then be consumed by the proof's from_map
. This seems to work okay for homogeneous values, but not so well if not. Lots of details to work out still and I'm sure there are a lot of improvements that could be made. I know in Rust these things can achieved at compile time via procedural macros, and I did some initial work over the summer that served as some inspiration for this.
This is kind of the feel that I want (I tried to make it rusty, but I'm sure it wouldn't compile):
let engine = Engine::new()
let wallet1 = engine.deploy(WalletContract::new());
let wallet2 = engine.deploy(WalletContract::new());
let dai = engine.deploy(DaiContract::new());
DaiContract::set_balance(dai, wallet1.address(), 100);
let tx = wallet.call(dai.address(), DaiContract::Send(wallet1.address(), wallet2.address(), 50);
let new_state = engine.execute(vec![tx]);
assert_eq!(DaiContract::from_state(new_state).balance(wallet2.address), 50);
Some interfaces that would be needed for this to work:
trait Contract<K, V> {
fn asm() -> Vec<u8>;
fn to_map(&self) -> HashMap<K, V>;
fn address(&self) -> Option<Address>;
}
trait Engine<K, V> {
fn deploy(&mut self, c: Contract<K, V>) -> &Contract;
fn execute(&mut self, txs: Vec<Transaction>);
}
trait Proof<K, V> {
fn to_map(&self) -> HashMap<K, V>;
}
trait ProofBackend<K, V> {
fn from_map(&mut self, map: HashMap<K, V>);
}
Stateless execution is one of the defining factors of Ethereum 2.0. It will alleviate the I/O bottleneck faced by Ethereum 1.0 clients and minimize the amount of data needed for validators to begin validating a new shard. Although it solves some issues, it raises equally as many. Here are some characteristics that are important for a robust stateless system:
Since May, we've been exploring many different facets of stateless systems including compile-time reflection of Rust structs to sparse merkle trees, efficient in-place lookup strategies for sparse merkle trees, single-pass authentication of merkle proofs, atomic modification of proofs in nested execution, a practical example of a stateless token execution environment (and required tooling), and general designs for a stateless execution environment for user contracts.
As we look towards prototyping an umbrella execution environment, we think it is time to try formalizing some of this toolchain so that more efforts can begin in parallel and efforts are not duplicated across projects and organizations. The main pieces of this toolchain that we see are:
Runtime
The runtime environment is the base of the whole toolchain. Its semantics define how the multi-proof backends operate which in turn affect each subsequent aspect of the toolchain. The Ewasm team has done a lot of work defining this engine within the bounds of both Ethereum 1.0 and Ethereum 2.0. We have also begun to experiment with certain runtime heuristics in the ewasm-rt such as wasm-spawning-wasm and atomic proof modification. These areas require more research, but are advancing. The biggest missing pieces to the runtime still are secure metering solutions, efficient bignum host functions and multi-proof dependent middleware for stateful execution. Although the middleware won't be utilized by consensus nodes, it will be important for state providers and proof generation in general.
Authenticated multi-proof backend
The multi-proof system is essentially the backbone of all stateless computation. There are numerous flavors of proof systems: sparse, ssz sparse, patricia, knuth binary, etc and even more ways of serializing them. The actual time and size complexity of each proof system varies and should be subject to more rigorous research and improvement. All of these systems can be abstracted in terms of the toolchain as whole by thinking of them as authenticated key-value stores.
They act as an integrity shim on top of the actual data values. The runtime should generally be agnostic of the semantics of the proof backend, but it must make an exception in order to provide a stateful executor. That exception in particular is that it must a) maintain a list of
get
andset
operations against the stateful data store, which can later be used to construct the necessary multi-proof and b) support middleware from the proof library which allows for runtime decisions (e.g. what should aget
return for keyK
that hasn't been "officially" initialized yet?).Developing libraries that provide all necessary tools is large effort. I discovered this first hand while developing a tool for composing proofs for
sheth
. Most of my work has revolved around implementations of sparse trees, so there may be additional things to consider for other types of trees, but the two areas of work here are a) compile-time reflection of data structures to general indexes and b) efficient multi-proof libraries for general index based proofs.Orchestration tools
Until now, we've been rolling our own orchestration tools. As we start looking towards building smart contracts on top of an execution environment that needs to be deployed to a runtime, we see the need for consolidation of these tools more clearly. In fact, even the existing tools combined are still not adequate enough to provide the developer experience we are striving for. I believe that we are in desperate need of both a testing framework for stateless contracts and a simulator for long running processes. In both instances, it's important that the tools are highly interoperable with the proofing libraries so that users can mostly ignore their existence.
I think the Truffle is a great baseline for testing DevEx that should be provided for stateless contracts:
In order to achieve that level of abstraction from the proofing backend, it is critical to throughly define a standard interface for proof libraries.
EE & smart contract languages
Just as most developers don't program in x86 assembly or EVM assembly anymore, we need to investigate high-level solutions for writing EEs and smart contracts. Fortunately, WebAssembly is a target for the LLVM so most languages already compile to it. That is only about a quarter of the story though. Many languages have their own runtimes, garbage collectors, and debugging infrastructure that gets included in binaries. However, writing EE / smart contract code is analogous to embedded systems engineering where every clock cycle and byte matter. We need to develop libraries for these languages and tools such as Chisel to support efficient execution and bundle size. Additionally, multi-proof libraries must be written / implemented natively for other languages.
Client libraries
The final piece to the toolchain puzzle is the client libraries used link application developers with users. Fortunately, there is already a lot of existing work that can be leveraged. For instance, if the other pieces are done correctly it should be fairly to extend
web3.js
to first send a transaction to a state provider to calculate the proof before submitting it to the eth 2 mempool. It's likely that most of work will be a) supporting the different flavors of execution environments and b) ironing out the incentive schemes for state providers.