contracts: Support RISC-V bytecode

athei commented 1 year ago

⚠️ : The support for RISC-V will only be in addition to the Wasm support. Wasm is not going anywhere. It is also a non breaking change. Meaning it does not matter which bytecode a contract uses. You can call it in the same way. That is true no matter how a contract is called (by another contract vs. by an extrinsic).

Here is a write up by @koute with more information: https://forum.polkadot.network/t/exploring-alternatives-to-wasm-for-smart-contracts/2434

Why we need a new bytecode

The idea of supporting an alternative to WebAssembly (wasm) on pallet-contracts is an idea that developed for the last couple of month. It started with discussions between various engineers. We came to the conclusion that wasm is not the optimal byte code to formulate contracts in. It comes down to few key insights but from which a lot of consequences arise:

1) A stack machine does not work well for performance. A compiler that needs to transform it to a real world machine (register machine) has either non linear compilation time (wasmtime) or produces slow code (wasmer). Both of them are severely behind in startup time when compared to even a non in-place interpreter (wasmi). Since startup time is as important as execution speed we stuck with an interpreter so far. 2) Wasm is complex. Due to its high level structure validation of the code is required before it can be compiled and ran. This validation can of course contain bugs that lead to catastrophic events. Compare that to a simple ISA (like RISC-V) which does not need global validation. Any invalid operation just traps deterministically.

The bottom line is that we want a different byte code that does not have this properties. It was unclear so far if want to use an existing architecture or write our own (based on an existing design of course).

First trial: BPF

As a first attempt to check whether supporting a new bytecode is viable we hacked together a node which supports the BPF on pallet-contracts. I strongly recommend reading this report: https://forum.polkadot.network/t/ebpf-contracts-hackathon/1084

eBPF is an interesting target because it is designed to be trivially compileable to the architectures the Linux kernel supports. Essentially just a mapping between instructions. The key inside is that it needs to be RISC and low general purpose register count at the same time. However, BPF has its problems. It is not designed for performance and the upstream LLVM backend doesn't compile all code. This is because it is designed for in-kernel use. Hence we can't use the stock Rust compiler.

While this was an interesting experiment it is probably not the bytecode we want to settle for.

Better: RISC-V

This is another bytecode that was floated as a candidate for a while. It is a logical choice: A modern clean sheet design that is modular instead of incremental just as wasm is. Quite exceptional for a real world architecture. That allows us to only support the instructions we need for contracts. The only downside when compared to BPF is that it has 32 general purpose registers instead of 11. This is a problem as our main host architecture amd64 has only 16 registers. This prevents us from mapping RISC-V registers 1to1 to native host registers. But being able to do that is what enables us to have the best of both worlds: Compile as fast as wasmi while emitting code that performs in the same order of magnitude as native code.

However, after reaching out to @koute for help he came to the conclusion that RISC-V is still a viable target and we have the following option which all come with their caveats:

1) Use the riscv32e target for contracts which has a reduced register set (16 regs). This is the preferred solution. However, the LLVM backend for this target is not merged and hence it is not yet supported by upstream Rust. 2) While JITing we just spill the high registers to the stack. Execution performance seems to be low as minimal as there are diminishing returns with more registers. However, we are interested in the worst case and this might even attack able by a malicious contract. Additionally, it adds complexity to the consensus critical JIT. 3) Add an offline post processing step that transforms a riscv32i (32 regs) program to a riscv32e program. This would be added to cargo-contract. Since it happens offline it can do non linear optimizations and register allocations. However, writing and maintaining this would probably be more work than just spilling the registers in JIT. It might still be worth it to reduce complexity of consensus critical code.

I cannot stress enough how instrumental @koute was for the research into RISC-V. He wrote a RISC-V to amd64 JIT in a day to proof that the plan to have a trivial JIT is viable. This is why we can be somewhat confident that RISC-V is the way forward.

This is the execution performance of that JIT (lower is better):

wasmi: 108ms
wasmer singlepass: 10.8ms
wasmer cranelift: 4.8ms
wasmtime: 5.3ms
koute JIT: 25ms

Keep in mind that zero optimization went into the JIT. It is a completely naive implementation just to proof that it works. It is reasonable to expect that we eventually perform better than wasmer singlepass while having interpreter style startup speeds.

cc @pepyakin

Next Steps

[x] Grab the rv32e patch for LLVM, apply it and compile rustc that can emit rv32e code, and see how this affects performance, the size and JIT complexity. If this turns out to be very valuable we might want to fund the completion of the patch.
[x] Rig ink so that it can emit RISC-V: https://github.com/paritytech/ink/pull/1718
[ ] Add a host function in substrate to execute this bytecode: https://github.com/paritytech/polkadot-sdk/pull/3520
[ ] Make contracts pallet support this, and just see how a more real world use of it goes.
[ ] Write a spec for everything and further discuss the details, most likely while implementing a production-ready prototype (and there's a bit of stuff to decide here; e.g. the container to hold the bytecode [we probably don't want to use ELF], versioning, runtime memory layout, syscall interface, metering, etc.).

koute commented 1 year ago

Just a quick FYI to anyone reading - regarding my RISC-V experiment, soon I will be writing a more detailed writeup of my research into this and what I've learned in more detail (I just need to finish dealing with some higher priority issues first). The TLDR version is that the RISC-V target is very promising, we should explore it further, and in my opinion is definitely a better target than eBPF in almost every aspect.

koute commented 1 year ago

Here's a link to the full writeup of my experience: https://forum.polkadot.network/t/exploring-alternatives-to-wasm-for-smart-contracts/2434

vivekvpandya commented 1 year ago

Can we make it directly support LLVM ByteCode?

koute commented 1 year ago

Can we make it directly support LLVM ByteCode?

I don't think that's a good idea. LLVM bytecode is significantly more complex, will arbitrarily change in the future (it's not a stable target like RISC-V is) and requires costly register allocation step to JIT it into native code (since it's in the SSA form). It essentially has all of the downsides of WASM, and more.

Lohann commented 1 year ago

Question: if the goal is optimize the smart-contract execution, I also think you must also consider update the SEAL interface for something that more likely supports parallel smart-contract execution.

Solana have a solution called Sealevel, which basically allows them to deterministically execute smart-contracts in parallel:

The reason why Solana is able to process transactions in parallel is that Solana transactions describe all the states a transaction will read or write while executing. This not only allows for non-overlapping transactions to execute concurrently, but also for transactions that are only reading the same state to execute concurrently as well.

Support for parallelism must be thinking from the beginning, and honestly must be considered for the Runtime too, the support for sp-tasks was removed: https://github.com/paritytech/substrate/pull/12639

Lohann commented 1 year ago

~Probably you guys already have explored this, but did you guys took a look at Wasm3 M3: Massey Meta Machine architecture? Can't pallet-contracts use a similar approach?~

Nvm, it was already explored by wasmi: https://github.com/paritytech/wasmi/issues/314#issuecomment-1037130378

koute commented 1 year ago

Question: if the goal is optimize the smart-contract execution, I also think you must also consider update the SEAL interface for something that more likely supports parallel smart-contract execution.

I'd say the main goal is simplification; extra performance's just a bonus.

but did you guys took a look at Wasm3 M3: Massey Meta Machine architecture?

It's a neat idea for speeding up interpreters, but from the example generated machine code in the README it most likely would be significantly slower. (For that particular example my RISC-V recompiler can directly map a RISC-V or operation to the AMD64 or operation generating a single machine instruction, while theirs generates 5, and one of them is a jump.)

Lohann commented 1 year ago

For that particular example my RISC-V recompiler can directly map a RISC-V or operation to the AMD64 or operation generating a single machine instruction

Ok now I'm confused, to be able to generate 1:1 machine code, you need to compile the RISC-V into Host's machine code, and the pallet-contracts itself is compiled using WasmTime, because of that is not possible to use inline assembly in any pallet, you can't even know what is the host's architecture from inside the runtime. I may be wrong, but in my undertanding the only way to achieve near native performance using RISC-V for dynamic code, is creating a new host function to compile it into host machine code (which may be safe it the compilation time is deterministic 1:1 instruction).

koute commented 1 year ago

is creating a new host function to compile it into host machine code

That's the plan, which is why it's important to keep it simple.

burdges commented 1 year ago

I believe state parallelism remains a general parachain/parathread issue, not smart contract specific. We'll simply cram more parachain blocks into the window between relay chain blocks, so those blocks would not themselves be parallel, aka they have a sequence, but they'll all have different approval checkers, so their workload becomes parallel to the relay chain and its validators. We'll hold inclusion and finality upon them all being included or approved.

athei commented 1 year ago

Please not discuss parallelism here. It is completely orthogonal and not at all related to pallet-contracts. If we ever decide that we want parallel runtimes (AFAIK as of right now we don't) we can look into this. I know about sea level and the required changes to our interface if that day will come. But it has nothing to do with RISC-V.

vivekvpandya commented 1 year ago

Is it possible to do first step with gccrs ? As GCC already has RVE support in upstream? Also I read that RVE also changes stack alignment to 32 bits, I hope this is fine.

koute commented 1 year ago

Is it possible to do first step with gccrs ? As GCC already has RVE support in upstream?

No. Right now gccrs can't even compile core. Maybe in the future, but definitely not in the near future. rustc_codegen_gcc could work, although it might require some extra work (I don't know whether anyone tried to use it to generate RISC-V code nor whether it'd support RV32E out of box or need extra patches)

Also I read that RVE also changes stack alignment to 32 bits, I hope this is fine.

Doesn't matter. In the current experiment we're not even using the native stack at all. (Which could be worse for performance, but it's nice due to its simplicity.)

athei commented 1 year ago

Getting the patch into LLVM seems much closer than getting gcc to work.

Polkadot-Forum commented 1 year ago

This issue has been mentioned on Polkadot Forum. There might be relevant details there:

https://forum.polkadot.network/t/ebpf-contracts-hackathon/1084/13

Polkadot-Forum commented 9 months ago

This issue has been mentioned on Polkadot Forum. There might be relevant details there:

https://forum.polkadot.network/t/polkadot-release-analysis-v1-5-0/5358/1

athei commented 6 months ago

Some updates:

The PR to make PolkaVM available to pallet-contracts is open here: https://github.com/paritytech/polkadot-sdk/pull/3520
The rv32e changes were finally merged into LLVM. So we can compile for PolkaVM with a stock toolchain soon: https://github.com/llvm/llvm-project/pull/76777
The tests for pallet-contracts were converted from wat (wasm assembly) to Rust and are already compiled (but not run) for PolkaVM: https://github.com/paritytech/polkadot-sdk/pull/2654
Some runtimes can already be compiled to PolkaVM and the CI does just that: https://github.com/paritytech/polkadot-sdk/pull/3179 https://github.com/paritytech/polkadot-sdk/pull/3209
An executor for PolkaVM was merged that runs PolkaVM runtimes: https://github.com/paritytech/polkadot-sdk/pull/3458

paritytech / polkadot-sdk