open-web3-stack / XCQ

Cross-Consensus Query Language for Polkadot
Apache License 2.0
8 stars 1 forks source link

Research: PolkaVM #1

Closed xlc closed 6 days ago

xlc commented 3 weeks ago

Research capability of PolkaVM to ensure it meet of all the requirements for XCQ usage and document the usage https://github.com/koute/polkavm https://github.com/paritytech/rustc-rv32e-toolchain

xlc commented 2 weeks ago

Related https://github.com/koute/polkavm/issues/114

Need to figure out how to do memory management or how to not do it

indirection42 commented 1 week ago

Getting Started

Prerequites

Run PoC

  1. Install polkatool^1(for relinking to .polkavm blob from a standard RV32E ELF) and chain-spec-builder^2(for building chainspec from a wasm): make tools
  2. Build a PolkaVM guest program^1: make poc-guest
  3. Run a PoC and expected runtime structure:
    • Run a simple host program which executes guest program (with trace turned on): make poc-host
    • Run a runtime with execute_query api which executes guest program bytes via chopsticks: make run

Explainations

How guest program communicate with host?

Polkavm adopts a similar approach for guest accessing host functions to WASM.^3 In guest program, the host functions declarations are annotated with polkavm's proc-marco polkavm_import. The definitions of guest functions are annotated with polkavm_export. In host program, we register host functions through linker.func_wrap Due to the limit of ABI, the signature of the those functions are limited to some primitive numeric types like u32, i32, u64(represented by two u32 register).

How to pass bytes from host to guest and vice versa?

In general, we can pass bytes between host and guest via guest's stack or heap. ^4 The stack size of a guest program is 64KB, and the heap size is less than 4GB.

Specific Usages in Details:

How to pass non-primitive data types between guest and host?

Basically, if a data type contains no objects on the heap, then byte-to-byte copy is enough, and both guest and host should have the same layout of the type to interpret data correctly.

References

PolkaVm is a general purpose user-level RISC-V based virtual machine.

For more details, please refer to PolkaVM Announcement on Polkadot Forum

xlc commented 1 week ago

The current poc program is big (>2k bytes) and I think it is due to the usage of Box. Need to spend a bit more time to see if we can do raw memory operation without using box and reduce the program size. Also maybe I shouldn't just guess why it is large. Need a way to figure out what contributes to the program size. The disassembler provided by polkatool could be helpful.

indirection42 commented 1 week ago

The current poc program is big (>2k bytes) and I think it is due to the usage of Box. Need to spend a bit more time to see if we can do raw memory operation without using box and reduce the program size. Also maybe I shouldn't just guess why it is large. Need a way to figure out what contributes to the program size. The disassembler provided by polkatool could be helpful.

I delegated mem allocation and writes to host. As a result, the generated guest program is 235 bytes now. https://github.com/open-web3-stack/XCQ/blob/828857de418d3731f3d3424cec6163ff0891739c/poc/guest/src/main.rs#L11-L16 https://github.com/open-web3-stack/XCQ/blob/828857de418d3731f3d3424cec6163ff0891739c/poc/host/src/main.rs#L5-L30

xlc commented 1 week ago

Followup questions:

For future considerations:

indirection42 commented 1 week ago

For question 1: Yes, we can, I just found polkavm_derive::sbrk is doing sbrk directly in guest. For question 2: Since the major purpose of host call is to give input and get some output, it doesn't make sense to write some data guests know to some address guests also know. However, I'm not sure if we have the use cases that want to share data between different guest functions in the future. For now, I think you're right. For question 3: Yes, passing pointer is easier, I am still not very sure all the use cases are covered. I will do a PoC and think more. If necessary, I will have a huddle with you. For question 4: I'm not very sure for now, it seems the stack size is 64kb, and the heap size is dynamic, roughly calculated by 4GB minus other sections. I will have a check. And I think there is a use case requires heap memory allocation: (TODO)

indirection42 commented 1 week ago

I did a PoC for passing custom type and make most data live on the stack including returned value from guest. I think it should be fine If the host reads to its space in time. Exception is that args passed from host to guest at the entrypoint on is on the heap. https://github.com/open-web3-stack/XCQ/blob/ae7dbc1e97f68f56542baedbe1ce59250583057e/poc/guest/src/main.rs#L21-L39 https://github.com/open-web3-stack/XCQ/blob/ae7dbc1e97f68f56542baedbe1ce59250583057e/poc/host/src/main.rs#L5-L44