Closed gavofyork closed 5 years ago
Now the heap is controlled by the external to runtime code: there are ext_malloc
and ext_free
functions that layout heap and provide allocation machinery. This approach has some upsides: it should be more performant, and sort of more convenient (since some external functions actually require allocations).
But it has some downsides too. It also should be implemented outside of the runtime, so for us this means we need at least 2 different implementations: one for Rust and one for JS. So such mechanism may increase the chance of consensus issue.
So maybe we should re-consider the question about allocation mechanism. To start, we probably need to measure the perfomance impact of placing allocation inside the runtime.
There are a few places where unsafe
code can be broken by changing the allocator, for example this.
It's unfortunate that even though we have a custom system allocator we can't be sure that it is safe to free memory that was allocated with the system allocator with Vec::drop
.
We need this for contracts because they can use a lot of allocations (bounded only by gas limit).
There are a few places...
I'm unconvinced any of these are unsafe, or at least cannot be made safe, in practice.
I already checked to ensure that Vec::drop
frees using ext_free
for sanity.
Is there any way that Vec::drop
could possibly deallocate other than by calling ext_free
? Assuming that there is appropriate alignment (which is surely trivial given that we're dealing only with Vec<u8>
), then do we not satisfy those invariants in practice?
My worries are about the thing when somebody violates an invariant - thus invoking UB - "in practice" is a very weak condition. Sure, these safety requirements assume a lot of uncertainties like used allocator, platform, rustc/lib, the type of T. By fixing these you make less chance of actual breakage in practice.
But my main point that there is still a chance of breakage and we can't guarantee that the compiled code does what we expect. In order to check that in practice, we have to check the behavior every time the code is compiled. To make it worse, the documentation doesn't actually mention that this is related to allocator, they just require this invariants to be held. Thus, in theory, you have to check every used function actually, because some of them might rely on this invariant. And I don't mean cases like push
(which performs reallocation which performs free
ing), I mean literaly every other method, including .len()
. Even testing is no help much. This is the nature of UB, unfortunately.
Yes, I agree, this sounds like over conservative, pessimistic and theoretic. And I yes I agree that the chance of breakage is rather low.
But it is still there and we have to admit it.
The good news is: the mentioned code is no longer a problem.
So while I certainly don't want to risk introducing UB into consensus code, I also don't want to be a slave to dogma.
The only thing we actually need to check is that the data at the pointer passed in to from_raw_parts
is allocated in precisely the same way that a Vec<u8>
in the runtime would allocate it. This is a small piece of fairly trivial logic and is easy to audit prior to a production runtime release.
Other, non-Rust, runtimes may potentially need to memcpy
it and free
the pointer manually if they do not have a similar API or cannot be guaranteed that their language's abstraction can't use the malloc
ed data or won't free
it as expected. But that's not really our problem, at least not yet.
I don't want to add an needless memcpy into every single storage fetch due to over-conservative reading of the documentation.
Note that my remark is more as a one pro for using the approach proposed here. So if we decided to implement this approach (i.e. runtime allocates memory and substrate fills it) then we wouldn't even bother with this issue. But yeah as described and discussed this requires some research on it.
Here is another idea, though still an invasive one:
Drop
implementation which calls ext_free
on that pointer explicitly.Deref<Traget=[u8]>
impl, which allows treating it as slice. .to_vec()
function (or rely on Deref impl) to convert this buffer to vector.As far as I can see, this should satisfy all the use cases and don't pessimise things too much: If a user wants to use it as slice Deref
got the user covered. If the user wants to use it as Vec
i.e. push
something, then the case is not pessimised compared to Vec::from_raw_parts
because length is equal to capacity and it will require reallocation anyway. Cases like pop
will experience regression though.
runtime allocates memory and substrate fills it
I don't like that approach as it involves an extra trip over the native/wasm divide, and an extra lookup into the hashmap or trie.
The other approach is, as you say, invasive. That said, that call is essentially only made when reading a typed storage item so it may not be much of an issue.
I mentioned this to @pepyakin in a private conversation, but I think that I have a design that removes the need for calling into the contract's malloc
implementation at all (and therefore can allow malloc
to be just another function, and we can have some implementation of dynamic libraries that addresses the code size issues).
Essentially, read_storage(key_ptr: *const u8, key_len: u32)
(or whatever the signature is) returns some handle (probably just a u32
) that can then be used to read a fixed number of bytes at an offset with a different function, using a set of functions like:
extern "sysv64" fn open_handle(key_ptr: *const u8, key_len: u32) -> u32;
// Returns number of bytes written
extern "sysv64" fn read_from_handle(handle: u32, offset: u64, dst: *mut u8, dst_len: u32) -> u32;
// This would likely be called in the `Drop::drop` impl of some wrapper type
extern "sysv64" fn free_handle(key_ptr: *const u8, key_len: u8);
We can start with only allowing 1 handle to be active at once, since that massively reduces implementation complexity, and we only expose it via a shim of the existing read_storage(key: &[u8]) -> Vec<u8>
that would be implemented in userspace. In the future we can write a wrapper than implements Read
(and we can also implement Write
using a similar approach), which would allow streaming the data. AFAICT this would be zero-overhead compared to the current approach if the userspace also had a way to get the length of the value given a handle to it so they could preallocate a vector with that space - we wouldn't even need to write an optimised memcpy
that the contract could call out to as I was originally suggesting to Sergei, since the memcpy
itself would be done by the host.
Obviously it's quite the rearchitecture but the way I see it this removes most of the complexities around malloc
, is no slower than the current implementation and allows a lot more flexibility in userland code - allowing contracts that do no allocation at all and only operate on fixed-size buffers or streaming data.
if the userspace also had a way to get the length of the value given a handle to it so they could preallocate a vector with that space
The would probably necessitate a further call across the wasm/native divide, and potentially having to go back to the hashmap without substantial redesign of the native side.
Streaming data in user-land from native is a non-starter. We do it already with IncrementalInput
and it's slow since a round trip to native land is far slower than a pointer addition. One of my tasks will be to remove it entirely.
I checked the impl of Vec
, and unsurprisingly, the code demonstrates that it's really not an intractable problem to ensure no UB. Here's the code in question: https://github.com/rust-lang/rust/blob/master/src/liballoc/raw_vec.rs#L80
let align = mem::align_of::<T>();
let layout = Layout::from_size_align(alloc_size, align).unwrap();
let result = if zeroed {
a.alloc_zeroed(layout)
} else {
a.alloc(layout)
};
For our Vec
, A
is Global
, which means a.alloc
effects a direct call into our ext_malloc
via __rust_alloc
which handles alignment.
To make the existing code accordingly correct we just need to ensure the data is aligned to mem::align_of::<u8>()
, as the runtime would state. This is a fixed platform value unlikely to change which can be determined by running https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=7853a458ee9621a5b742fd64414f9c70
if the userspace also had a way to get the length of the value given a handle to it so they could preallocate a vector with that space
The would probably necessitate a further call across the wasm/native divide, and potentially having to go back to the hashmap without substantial redesign of the native side.
Streaming data in user-land from native is a non-starter. We do it already with
IncrementalInput
and it's slow since a round trip to native land is far slower than a pointer addition. One of my tasks will be to remove it entirely.
A round trip to native code need not be expensive, it is with wasmi but all calls are expensive with wasmi. With lightbeam it would be essentially free, or as free as any sysv function call can be. Obviously lightbeam is pre-pre-pre-pre-pre-alpha right now but I don't think reducing host calls is necessarily a reason to throw the idea out the window.
My idea is that when you request a handle the native side fetches the vector into memory, either just using references with rental
or by prehashing the key and making subsequent accesses fast that way. This would make getting the length just the price of a host call plus a single pointer deref.
EDIT: We can also work on making host calls faster in wasmi somehow.
Sure, but this would require a lot of redesign, refactoring and optimisation, is complex and accomplishes essentially nothing more than what we already have.
Made a start in gav-start-refactor
Closed by #1460
Reopen due to revert https://github.com/paritytech/substrate/pull/1502
We currently use a Linear Allocator. It's very inefficient and means we need ~128 x 64KB pages to validate a block with a 350kb transaction in it. Switch this for a Buddy allocator or Slab allocator.
Furthermore,
sr-io::storage
was recently pessimised through adding a needless memcpy into_vec
. The code should be reverted thus:To ensure this code doesn't result in UB on the part of
Vec
,ext_get_allocated_storage
should ensure that the pointer is aligned according tou8
allocations on theunknown-unknown-webassembly
platform (almost certainly 4 byte). This is won't be necessary as long as theHeap
implementation ensures all allocations are on such a boundary.