Closed Ddystopia closed 7 months ago
If you are willing to accept this feature, I can contribute. I don't know which api you would prefer, so I would be grateful for your opinion and ideas.
Hi @Ddystopia ,
from what I know the Wasm3 interpreter provides Wasm capabilities for very constrained systems that cannot even allocate a single 64kB Wasm linear memory page. It would be interesting to know what design the Wasm3 maintainers chose and decide whether such a design could be implemented for Wasmi. Generally contributions like these are welcome as long as they do not overcomplicate nor regress performance of the interpreter.
Ideally we'd have an efficient automated system.
The next best option is to add a Config
setting for this new memory option. Due to compatibility with the Wasmtime API I would not like to change the API of the Memory
type for example.
Etc..
Wasm3 are using optional config option to limit maximum amount of memory to be allocated.
The following devices can run Wasm3, however they cannot afford to allocate even a single Linear Memory page (64KB). This means memoryLimit should be set to the actual amount of RAM available, and that in turn usually breaks the allocator of the hosted Wasm application (which still assumes the page is 64KB and performs OOB access).
Introducing that would break guarantees in ResourceLimiter
docs.
I would like that feature as well, but custom memory backing option would better suit my needs, and would allow integrating such a feature seamlessly.
What do you think about making a trait from current ByteBuffer
(or maybe something more abstract) ?
Maybe add a config entry taking something like Box<dyn ByteBuffer>
factory or introduce new generic to the Store
, B: ByteBuffer
(in impls that would require it) ?
As far as I can see from the codebase, there are 2 sources of MemoryEntity
construction - from wasm itself and manually by user. I personally don't need it, but I suppose from a library perspective, it would be nice to give the user more information so that they within their implementation can choose the representation dynamically, for example Vec<u8>
or &'static mut [u8; N]
. If it the case, when user constructs memory manually, it would not be a problem, but I don't know what additional info could be passed when memories are constructed from the wasm blob itself.
I that case ResourceLimiter
may be not needed at all - user can manage that kind of problems with custom implementation.
Generally I would like to keep the API surface of Wasmi as simple as possible and do not unnecessarily move away from Wasmtime.
After reading all your thoughts your main goal seems to avoid dynamic allocations. Maybe it is the simplest way to just use a custom GLOBAL
allocator with a stack based memory backing. Therefore memory allocated via Rust Vec
etc. would still live on the stack. This solution wouldn't require a change to Wasmi and should fit your needs. It is very easy to write your own GLOBAL
allocator in Rust: https://doc.rust-lang.org/std/alloc/trait.GlobalAlloc.html
It is a bit sad that Wasm3 solution to the 64kB Wasm page problem is to simply not allow to allocate a single page. But I am also not really aware of any proper solution for these targets.
With Wasmi's resource limiter you can either prevent growing of linear memories just like Wasm3 or limit their growth to an amount supported by your custom global allocator.
There might be some confusion. My goal is not reducing dynamic allocations, I already have static memory dedicated to heap and custom Global allocator. My goal is to not have that allocation in that heap - it is more stable, because code would fail at build time, rather then at runtime - static memory is handled by compiler/linker. Also it would reduce damage from fragmentation and make system more stable.
The only way that changing global allocator more would help, is if I would have
if layout.size() == 64KiB {
return ptr_to_static_array;
}
...
My goal is more like reducing size of the heap at all (I need to define it at build time), and fragmentation adds a lot of trouble with that 64KiB page (I need to give more then additional 64KiB to the heap now)
While I'm talking about it as "my problem", it is applicable for majority of embedded use cases, because hardware I am using has relatively many RAM and still would greatly benefit from such feature, so devices with fewer resources would benefit even more. Less resources = cheaper hardware
What would really help me is if you could explain the following things:
There seems to be some complexity here that is required to be solved. With those questions I want to make sure that we are ideally not simply move the complexity elsewhere (e.g. Wasmi) but instead reduce it to its core components and then decode how to go with it. I am myself not very experienced with programming embedded systems therefore please feel free to elaborate in your answers.
To answer your questions from earlier:
What do you think about making a trait from current
ByteBuffer
(or maybe something more abstract) ?
We had this in Wasmi to support OS based virtual memory pages. This was done for performance reasons (before I worked on Wasmi). When I benchmarked it I found that it did not even improve performance because the implementation didn't even make use of virtual memory capabilities so I kicked the whole feature out of Wasmi and it felt good because it was a hard to maintain and hard to test part of the codebase. For this feature we didn't use a trait but instead compilation flags which is arguably worse. However, I think both approaches would not really fit into Wasmi's design, especially with its Wasmtime mirror approach which is important to us.
Maybe add a config entry taking something like
Box<dyn ByteBuffer>
factory or introduce new generic to theStore
,B: ByteBuffer
(in impls that would require it) ?
A generic parameter on the Store
is among the things I'd probably never accept as a merge request for any feature since this would be kinda disruptive design wise.
One solution that could maybe work out is to see if it is possible to find common internal patterns between the current ByteBuffer
and a backend that uses a static buffer and provide a new constructor for Memory
to allow to construct such a static memory buffer based Memory
from those core parts (probably unsafe
API). The big undertaking here is to make sure that this does not regress the common ByteBuffer
usage and stays as an isolated change.
Hello, I'm a colleague of @Ddystopia and maybe I can answer those last questions better.
In general, embedded systems are required to be reliable and predictable, capable or running 24/7 for several years. Every unexpected runtime error / bug / fault / oom / whatever is a big problem - image you go on 10-day winter holiday, and a heating controller in your house stops working on day 2 ... (certainly there are many much more dangerous examples then this).
That's why it is so important for us to have as much compile-time guarantees as possible - to rule out as many runtime errors as possible.
How does code fail at build time if you use static memory as base for Wasmi linear memory instead of heap allocated memory via your custom allocator?
If we could create a 64KB static mut array (or something similar) and then pass it to WASMI, there would be a compile-time guarantee that WASMI will always get RAM it needs - without any additional testing. If the WASMI memory is dynamically allocated as it is now, we only get this guarantee by testing the firmware on a real hardware. Since firmware is evolving and heap is also used by other parts of the code (those are much smaller allocations though), change in a code unrelated to WASMI can still make WASMI fail due to heap allocation failure. So now nearly any code change requires retesting WASMI to confirm there is still enough unfragmented heap free for it to work.
Is memory fragmentation the main problem? If so, maybe this could be helped by adjusting your custom allocator to your needs a bit better with respect to Wasm's 64kB memory pages. E.g. the custom allocator could have a special routine whenever 64kB (or multiples of it) are allocated.
It's a part of the problem. Fragmentation is not a problem for small allocations, but it's always a problem for 64KB-sized allocations (at least in MCU world). Of course allocator could be adapted as you suggest, but this would still not provide compile-time guarantee.
What are the specs of the embedded system you are targeting? In particular of interest is memory of course but you can be more elaborate.
Renesas RA6M3 - it's an advanced MCU with 120 MHz Cortex-M4, 2MB flash and 640KB RAM.
@elrafoon Thanks for the elaborate information and details!
I think the best way to approach this problem from within Wasmi is to experiment with the internal implementation of MemoryEntity
and try to find common structures between the current ByteBuffer
and a static memory byte buffer.
In essence MemoryEntity
looks like this internally:
struct MemoryEntity {
bytes: Vec<u8>,
memory_type: MemoryType,
current_pages: Pages,
}
The most simplest design that allow for both Vec
and static buffer based backends would be something like this:
enum MemoryEntity {
Vec {
bytes: Vec<u8>,
memory_type: MemoryType,
current_pages: Pages,
},
Static {
bytes: &'static mut [u8],
memory_type: MemoryType,
current_pages: Pages,
}
}
However, this would incur overhead when accessing fields of MemoryEntity
. Also it has lots of common fields between the two variants. Therefore another possibility is to deconstruct the Vec
into its core parts given by https://doc.rust-lang.org/std/vec/struct.Vec.html#method.from_raw_parts.
use alloc::vec::Vec;
use core::slice;
struct MemoryEntity {
ptr: *mut u8,
length: usize, // formerly known as `current_pages`
capacity: usize,
memory_type: MemoryType,
static: bool, // needed to differentiate for `Drop` behavior since only `Vec` based `MemoryEntity` needs to deallocate
}
impl MemoryEntity {
pub fn data(&self) -> &[u8] {
slice::from_raw_parts(self.ptr, self.length)
}
pub fn data_mut(&mut self) -> &mut [u8] {
slice::from_raw_parts_mut(self.ptr, self.length)
}
}
impl Drop for MemoryEntity {
fn drop(&mut self) {
if self.static {
return
}
// The next line allows us to properly deallocate the internal byte buffer.
_ = Vec::from_raw_parts(self.ptr, self.length, self.capacity);
}
}
This design allows us to have no regressions for accessing the underlying byte buffer since there are no checks for data
or data_mut
. The only inconvenience is that MemoryEntity::grow
needs to be treated differently for both buffer backends. The only user visible change with this design is that we are going to need another Memory
constructor to allow to create MemoryEntity
with a static byte buffer:
impl Memory {
pub unsafe fn new_static(
mut ctx: impl AsContextMut,
ty: MemoryType,
buffer: &'static mut [u8]
) -> Result<Self, MemoryError> {
// etc...
// calls `MemoryEntity::new_static` internally
}
}
impl MemoryEntity {
pub unsafe fn new_static(
memory_type: MemoryType,
buffer: &'static mut [u8],
limiter: &mut ResourceLimiterRef<'_>,
) -> Result<Self, MemoryError> {
// etc...
}
}
With this information in mind (hopefully it works) you should be able to craft an experimental PR and see if it works to your expectations. Afterwards we can try to bring it in shape for an eventual merge, no promises though.
Note that those functions are marked unsafe
since generally &'static mut T
usage is shared memory usage which is kinda a bad practise in Rust since borrow checking this thing is not really possible. You could for example create multiple MemoryEntity
with the same &'static mut [u8]
byte buffer and have overlapping memories and therefore no guarantees whatsoever about memory accesses being exclusive as guaranteed by the borrow checker. This is also the reason why I am not sure if I want to actually add this to Wasmi since ideally you'd add unsafe
to all of its API since accessing data
or data_mut
or doing anything with the underlying byte buffer is actually not checked and therefore must be trusted.
Making the function unsafe seems unnecessary, as &mut
references are inherently unique. Any situation where two &mut
references point to the same memory would already result in undefined behavior well before any interaction with wasmi occurs. Also you can gen &'static mut
reference from Box::leak
, totally safe.
Also I am not sure if it needed to have MemoryEntity like this:
struct MemoryEntity {
ptr: *mut u8,
length: usize,
capacity: usize,
memory_type: MemoryType,
static: bool,
}
Rust's compiler optimizations might make following code the same in terms of speed, so conducting benchmarks should make a final decision:
// size_of::<ActualStorage>() == size_of::<Vec<u8>>()
enum ActualStorage {
Vec(Vec<u8>),
Static(&'static mut [u8])
}
struct MemoryEntity {
storage: ActualStorage,
memory_type: MemoryType,
current_pages: Pages,
}
impl MemoryEntity {
pub fn data(&self) -> &[u8] {
match &self.storage {
Vec(it) => &it[..],
Static(it) => &it[..],
}
}
pub fn data_mut(&self) -> &mut [u8] {
match &mut self.storage {
Vec(it) => &mut it[..],
Static(it) => &mut it[..],
}
}
pub fn grow(
&mut self,
additional: Pages,
limiter: &mut ResourceLimiterRef<'_>,
) -> Result<Pages, EntityGrowError> {
let ActualMemory::Vec(vec) = &mut self.storage else {
return Err(EntityGrowError);
};
// ...
}
}
impl Drop for MemoryEntity {
fn drop(&mut self) {
// ...
}
}
Rust's compiler optimizations might make following code the same in terms of speed
That's not possible since in the enum
case the Rust compiler has to make a conditional check for the enum discriminant every single time whereas in my example this check is avoided entirely for the commonly used data
and data_mut
methods. I usually take a look at the online compiler explorer for Rust to see codegen effects for things like these.
You are generally right that benchmarks should be conducted before implementing anything. Feel free to do so.
@Ddystopia @elrafoon is there still interest in doing an experimental implementation for this feature? Otherwise I'd like to close this issue.
Yes, there is. I will submit a PR in the near future. It's just that there are higher priorities right now.
I'm currently using WASMI for a project on an embedded target and am exploring ways to optimize memory management. The standard
Vec<u8>
used for wasm memory in can be restrictive in embedded contexts, where heap allocations are often less desirable. My aim is to have an option for using a static array, or a similar lean memory structure, in place ofVec<u8>
.Would it be possible to consider an alternative to
Vec<u8>
, maybe through a trait implementation that allows for different memory storage types? This addition would not only help my case but could also open up new possibilities for WASMI in various specialized environments.