Open pepyakin opened 2 years ago
Another disadvantage of the current approach is that we can't effectively make any non-trivial behavior affecting changes to the current allocator. There are various improvements to the allocator's internals which would be nice to make, however due to backward-compatibility concerns that carries with it a lot of risk.
Since the allocator is provided by the host any changes to its behavior will affect all of the historical blocks, and those blocks might depend on the allocator's exact behavior due to Hyrum's law. Even if we can guarantee that our chains won't be affected we can't do that for every other parachain and project building on Substrate.
Of course, an alternative to this would be to just use our existing host function versioning machinery and provide a malloc_v2
, etc., but then we'll have to maintain two allocators. And what if we'll want to make further improvements in the future? Are we going to maintain 10 copies of different allocators?
So I'm highly in favor of this. There might have been good reasons to keep the allocator on the host's side in the past; today I don't really see why we wouldn't want to move it into the runtime.
How would the host allocate the return value of a host function?
How would the host allocate the return value of a host function?
That is a good question. That is currently one of the biggest issues which we'd need to resolve, since currently some types marshaled across the FFI boundary are directly allocated by the client itself inside of WASM's linear memory.
Some random ideas off the top of my head (not all are necessarily great ideas; just typing out whatever comes to my mind):
memory.copy
instruction from the bulk memory operations extension.It should be doable. Of course we'd need to make sure that this doesn't regress the performance and that whatever we'll pick will be maintainable into the future.
Since this is not new, back in the day, I thought we would go with something like @koute described in the 2nd point. Specifically, I imagined that the host API would gain a function that would initialize the allocator. That function accepted parameters for function pointers to malloc
, free
(and today I would maybe add realloc
). Then the runtime is supposed to initialize the allocator before calling to the host functions that perform allocations. An alternative to that would be a similar construction, but the host finds the allocator functions itself by finding exports with names. That works too and I think this is more appropriate in the current setting than I used to.
However, if possible, I would like to avoid reentrance into the same wasm instance [^1]. One annoying thing is the stack, especially in the current implementation of wasmtime. That's because the stack limit is shared between nested calls and the host functions. Maybe it's fine for Substrate, but PVF requires it to be robust against malicious code.
We now live in 2022 and can think about enabling new features.
i32
and one externref
. The i32
holds the size of the buffer and the externref
is an opaque pointer to the buffer. Then the runtime would internally allocate memory to fit the buffer by whatever means it wants. Finally, the runtime would call another host function passing the externref
that would copy the data from the internal buffer into the memory at the designated location. We still need to limit the number and sizes of available buffers[^2]. So yeah, it's doable. There are plenty of options. The option we pick should be robust to malicious PVFs.
[^1]: Not entirely sure if it is possible, it may happen we might need to have that with something like paritytech/polkadot-sdk#370.
[^2]: If we reduce the number to 1, we get "the scratch buffer" technique that was/is employed in the pallet_contracts
.
[^3]: The toolchain currently does not support this feature well, to my knowledge.
Hi @koute @bkchr I wonder about progress status of these runtime allocator related features. Have anyone designed this part or some new discusses ?
No, nothing of this was designed so far.
TBH, I have designed this part before, I could just port it into substrate. I have considered various situations of upgrading the runtime specification, including version compatibility, and performing spec version checks during code update upgrades.
Of course, it is only for substrate, and the polkadot part may be more complicated. After all, there is a verification logic for running parachains, but the thinking should be the same.
Need a draft PR to show the idea?
TBH, I have designed this part before, I could just port it into substrate.
Can you give some brief overview of your design?
Firstly we should disable host allocator and export runtime allocator. So define an new allocator crate sp-allocator-v1
for wrapping sp-io:
[dependencies]
lol_alloc = { version = "0.3", optional = true }
# not used just enable feature
sp-io = { version = "7.0.0", path = "../io", default-features = false }
[features]
default = ["std"]
std = [
"sp-io/std",
]
disable_oom = [
"sp-io/disable_oom",
]
allocator-v1 = [
"lol_alloc",
"sp-io/disable_allocator",
# TODO: This part need to be redesigned
"sp-io/disable_panic_handler",
]
The crate maybe have the following code:
use core::alloc::GlobalAlloc;
use lol_alloc::{AssumeSingleThreaded, FreeListAllocator};
#[global_allocator]
pub static ALLOCATOR: AssumeSingleThreaded<FreeListAllocator> =
unsafe { AssumeSingleThreaded::new(FreeListAllocator::new()) };
#[no_mangle]
unsafe fn alloc(size: usize) -> *mut u8 {
ALLOCATOR.alloc(core::alloc::Layout::array::<u8>(size).unwrap())
}
#[no_mangle]
unsafe fn dealloc(ptr: *mut u8, size: usize) {
ALLOCATOR.dealloc(ptr, core::alloc::Layout::array::<u8>(size).unwrap())
}
#[no_mangle]
unsafe fn realloc(ptr: *mut u8, size: usize, new_size: usize) -> *mut u8 {
ALLOCATOR.realloc(ptr, core::alloc::Layout::array::<u8>(size).unwrap(), new_size)
}
// TODO: maybe it's better to rename this crate to `sp-runtime-abi`.
/// The dummy function represents the version of runtime ABI.
///
/// This name should be checked by host.
#[no_mangle]
fn v1() {
// nop
}
/// A default panic handler for WASM environment.
#[panic_handler]
#[no_mangle]
pub fn panic(_info: &core::panic::PanicInfo) -> ! {
core::arch::wasm32::unreachable()
}
/// A default OOM handler for WASM environment.
#[alloc_error_handler]
pub fn oom(_layout: core::alloc::Layout) -> ! {
core::arch::wasm32::unreachable()
}
The versioning here only manages the specification of exported primitive functions, of course, I think it can also cover more specifications.
And then we could import this crate for node-template-runtime
:
sp-allocator-v1 = { version = "7.0", default-features = false, path = "../../../primitives/allocator-v1" }
# ...
[features]
default = ["std"]
allocator-v1 = [
"sp-allocator-v1/allocator-v1",
]
# ...
And enable it extern crate sp_allocator_v1;
.
Now we could build a substrate wasm runtime without host allocator.
And since we mark the version info in exported function name(Or mark it in custom section, but I think the former design is better), we could refactor sc-executor
to check version info when handle runtime_blob
. The only thing need to be careful is that when meet codeUpdate
we should recreate instance.
Maybe define a AbiVersion
to handle different version.
/// The Abi version for runtime.
#[derive(Debug, Copy, Clone, Eq, PartialEq)]
#[non_exhaustive]
pub enum AbiVersion {
/// Legacy allocator defined by host.
#[default]
Legacy,
/// V1 see sp-allocator-v1.
V1,
}
@bkchr
Try to build node-template-runtime here: https://github.com/yjhmelody/substrate/tree/executor-inside-allocator
cargo build -p node-template-runtime --release
Now the wasm executor not work anymore, since v1
spec is not supported:
./target/release/node-template --dev --execution=wasm
Hi, @bkchr @koute Need I make a PR to show the idea? Although the changes to the wasmtime part will be somewhat troublesome
And since we mark the version info in exported function name(Or mark it in custom section, but I think the former design is better)
IMO it's not necessary to add yet another version number to this. The client should just check whether the runtime exports a function named alloc
(or a different name if for some reason we're afraid that alloc
might be accidentally exported).
There's also no need for dealloc
and realloc
, just alloc
is enough.
However, in my opinion it wouldn't be a bad idea to also transition host functions to not perform allocations at all.
This is very straight forward for the ext_crypto_*
, ext_hashing_*
, ext_trie_*
, and *_clear_prefix
functions by making them write to a pre-existing buffer rather than allocate a new one, and I think that everyone would agree that this is a good change (I personally don't understand why functions such as ext_hashing_sha2_256_version_1
don't already do that).
For the storage read
functions, it's a bit less straight forward as we need to indicate how much we've read, but should be simple as well.
The only not-so-simple ones could be the *_next_key
, *_get
storage functions, and ext_misc_runtime_version_version_1
, as they return a variable-length value whose maximum length can't really be known in advance. We can always make them work by introducing the necessity to call them a second time if the buffer is too small the first time. This could lead to a performance degradation. My intuition says that in practice it wouldn't be, but it should still be measured.
There's also no need for dealloc and realloc, just alloc is enough.
The current implementation obviously needs dealloc
, and here is a realloc
Issue https://github.com/paritytech/substrate/issues/11884
However, in my opinion it wouldn't be a bad idea to also transition host functions to not perform allocations at all.
For this part, I think it could not prevent the idea about exportingalloc
/dealloc
.
For executing a big block verification (such as code update and storage migration), memory must be freed to avoid oom.
It's not about one tx execution, it's about a big block time/space execution.
What I mean is that the client never frees the memory it allocates, it's always the runtime that does that, so there's no need for the runtime to export this function.
Oh, yeah, You are right.
But from the implementation point of view, it is not hard to export dealloc
, just like alloc
, and a certain degree of function completeness is guaranteed
Whether it is hard or not is completely irrelevant.
There exists a certain interface between the client and the runtime, and exporting alloc
is about modifying this interface so that the client uses alloc
to allocate the return values. Also exporting dealloc
modifies this interface further and for no reason.
It is actually not a new idea and was considered very early. We ended up having the allocator provided by the host.
Lately, there were several discussions about moving the allocator back into the runtime.