Support for Alternative Memory Backing Options

Ddystopia commented 9 months ago

I'm currently using WASMI for a project on an embedded target and am exploring ways to optimize memory management. The standard Vec<u8> used for wasm memory in can be restrictive in embedded contexts, where heap allocations are often less desirable. My aim is to have an option for using a static array, or a similar lean memory structure, in place of Vec<u8>.

Would it be possible to consider an alternative to Vec<u8>, maybe through a trait implementation that allows for different memory storage types? This addition would not only help my case but could also open up new possibilities for WASMI in various specialized environments.

Ddystopia commented 9 months ago

If you are willing to accept this feature, I can contribute. I don't know which api you would prefer, so I would be grateful for your opinion and ideas.

Robbepop commented 9 months ago

Hi @Ddystopia ,

from what I know the Wasm3 interpreter provides Wasm capabilities for very constrained systems that cannot even allocate a single 64kB Wasm linear memory page. It would be interesting to know what design the Wasm3 maintainers chose and decide whether such a design could be implemented for Wasmi. Generally contributions like these are welcome as long as they do not overcomplicate nor regress performance of the interpreter.

Ideally we'd have an efficient automated system. The next best option is to add a Config setting for this new memory option. Due to compatibility with the Wasmtime API I would not like to change the API of the Memory type for example. Etc..

Ddystopia commented 9 months ago

Wasm3 are using optional config option to limit maximum amount of memory to be allocated.

The following devices can run Wasm3, however they cannot afford to allocate even a single Linear Memory page (64KB). This means memoryLimit should be set to the actual amount of RAM available, and that in turn usually breaks the allocator of the hosted Wasm application (which still assumes the page is 64KB and performs OOB access).

Introducing that would break guarantees in ResourceLimiter docs.

I would like that feature as well, but custom memory backing option would better suit my needs, and would allow integrating such a feature seamlessly.

What do you think about making a trait from current ByteBuffer (or maybe something more abstract) ?

Maybe add a config entry taking something like Box<dyn ByteBuffer> factory or introduce new generic to the Store, B: ByteBuffer (in impls that would require it) ?

As far as I can see from the codebase, there are 2 sources of MemoryEntity construction - from wasm itself and manually by user. I personally don't need it, but I suppose from a library perspective, it would be nice to give the user more information so that they within their implementation can choose the representation dynamically, for example Vec<u8> or &'static mut [u8; N]. If it the case, when user constructs memory manually, it would not be a problem, but I don't know what additional info could be passed when memories are constructed from the wasm blob itself.

I that case ResourceLimiter may be not needed at all - user can manage that kind of problems with custom implementation.

Robbepop commented 9 months ago

Generally I would like to keep the API surface of Wasmi as simple as possible and do not unnecessarily move away from Wasmtime.

After reading all your thoughts your main goal seems to avoid dynamic allocations. Maybe it is the simplest way to just use a custom GLOBAL allocator with a stack based memory backing. Therefore memory allocated via Rust Vec etc. would still live on the stack. This solution wouldn't require a change to Wasmi and should fit your needs. It is very easy to write your own GLOBAL allocator in Rust: https://doc.rust-lang.org/std/alloc/trait.GlobalAlloc.html

It is a bit sad that Wasm3 solution to the 64kB Wasm page problem is to simply not allow to allocate a single page. But I am also not really aware of any proper solution for these targets.

With Wasmi's resource limiter you can either prevent growing of linear memories just like Wasm3 or limit their growth to an amount supported by your custom global allocator.

Ddystopia commented 9 months ago

There might be some confusion. My goal is not reducing dynamic allocations, I already have static memory dedicated to heap and custom Global allocator. My goal is to not have that allocation in that heap - it is more stable, because code would fail at build time, rather then at runtime - static memory is handled by compiler/linker. Also it would reduce damage from fragmentation and make system more stable.

The only way that changing global allocator more would help, is if I would have

if layout.size() == 64KiB {
  return ptr_to_static_array;
}
...

My goal is more like reducing size of the heap at all (I need to define it at build time), and fragmentation adds a lot of trouble with that 64KiB page (I need to give more then additional 64KiB to the heap now)

Ddystopia commented 9 months ago

While I'm talking about it as "my problem", it is applicable for majority of embedded use cases, because hardware I am using has relatively many RAM and still would greatly benefit from such feature, so devices with fewer resources would benefit even more. Less resources = cheaper hardware

Robbepop commented 9 months ago

What would really help me is if you could explain the following things:

How does code fail at build time if you use static memory as base for Wasmi linear memory instead of heap allocated memory via your custom allocator?
Is memory fragmentation the main problem? If so, maybe this could be helped by adjusting your custom allocator to your needs a bit better with respect to Wasm's 64kB memory pages. E.g. the custom allocator could have a special routine whenever 64kB (or multiples of it) are allocated.
What are the specs of the embedded system you are targeting? In particular of interest is memory of course but you can be more elaborate.

There seems to be some complexity here that is required to be solved. With those questions I want to make sure that we are ideally not simply move the complexity elsewhere (e.g. Wasmi) but instead reduce it to its core components and then decode how to go with it. I am myself not very experienced with programming embedded systems therefore please feel free to elaborate in your answers.

To answer your questions from earlier:

What do you think about making a trait from current ByteBuffer (or maybe something more abstract) ?

We had this in Wasmi to support OS based virtual memory pages. This was done for performance reasons (before I worked on Wasmi). When I benchmarked it I found that it did not even improve performance because the implementation didn't even make use of virtual memory capabilities so I kicked the whole feature out of Wasmi and it felt good because it was a hard to maintain and hard to test part of the codebase. For this feature we didn't use a trait but instead compilation flags which is arguably worse. However, I think both approaches would not really fit into Wasmi's design, especially with its Wasmtime mirror approach which is important to us.

Maybe add a config entry taking something like Box<dyn ByteBuffer> factory or introduce new generic to the Store, B: ByteBuffer (in impls that would require it) ?

A generic parameter on the Store is among the things I'd probably never accept as a merge request for any feature since this would be kinda disruptive design wise.

One solution that could maybe work out is to see if it is possible to find common internal patterns between the current ByteBuffer and a backend that uses a static buffer and provide a new constructor for Memory to allow to construct such a static memory buffer based Memory from those core parts (probably unsafe API). The big undertaking here is to make sure that this does not regress the common ByteBuffer usage and stays as an isolated change.

elrafoon commented 9 months ago

Hello, I'm a colleague of @Ddystopia and maybe I can answer those last questions better.

In general, embedded systems are required to be reliable and predictable, capable or running 24/7 for several years. Every unexpected runtime error / bug / fault / oom / whatever is a big problem - image you go on 10-day winter holiday, and a heating controller in your house stops working on day 2 ... (certainly there are many much more dangerous examples then this).

That's why it is so important for us to have as much compile-time guarantees as possible - to rule out as many runtime errors as possible.

How does code fail at build time if you use static memory as base for Wasmi linear memory instead of heap allocated memory via your custom allocator?

If we could create a 64KB static mut array (or something similar) and then pass it to WASMI, there would be a compile-time guarantee that WASMI will always get RAM it needs - without any additional testing. If the WASMI memory is dynamically allocated as it is now, we only get this guarantee by testing the firmware on a real hardware. Since firmware is evolving and heap is also used by other parts of the code (those are much smaller allocations though), change in a code unrelated to WASMI can still make WASMI fail due to heap allocation failure. So now nearly any code change requires retesting WASMI to confirm there is still enough unfragmented heap free for it to work.

Is memory fragmentation the main problem? If so, maybe this could be helped by adjusting your custom allocator to your needs a bit better with respect to Wasm's 64kB memory pages. E.g. the custom allocator could have a special routine whenever 64kB (or multiples of it) are allocated.

It's a part of the problem. Fragmentation is not a problem for small allocations, but it's always a problem for 64KB-sized allocations (at least in MCU world). Of course allocator could be adapted as you suggest, but this would still not provide compile-time guarantee.

What are the specs of the embedded system you are targeting? In particular of interest is memory of course but you can be more elaborate.

Renesas RA6M3 - it's an advanced MCU with 120 MHz Cortex-M4, 2MB flash and 640KB RAM.

Robbepop commented 9 months ago

@elrafoon Thanks for the elaborate information and details!

I think the best way to approach this problem from within Wasmi is to experiment with the internal implementation of MemoryEntity and try to find common structures between the current ByteBuffer and a static memory byte buffer.

In essence MemoryEntity looks like this internally:

struct MemoryEntity {
    bytes: Vec<u8>,
    memory_type: MemoryType,
    current_pages: Pages,
}

The most simplest design that allow for both Vec and static buffer based backends would be something like this:

enum MemoryEntity {
    Vec {
        bytes: Vec<u8>,
        memory_type: MemoryType,
        current_pages: Pages,
    },
    Static {
        bytes: &'static mut [u8],
        memory_type: MemoryType,
        current_pages: Pages,
    }
}

However, this would incur overhead when accessing fields of MemoryEntity. Also it has lots of common fields between the two variants. Therefore another possibility is to deconstruct the Vec into its core parts given by https://doc.rust-lang.org/std/vec/struct.Vec.html#method.from_raw_parts.

use alloc::vec::Vec;
use core::slice;

struct MemoryEntity {
    ptr: *mut u8,
    length: usize, // formerly known as `current_pages`
    capacity: usize,
    memory_type: MemoryType,
    static: bool, // needed to differentiate for `Drop` behavior since only `Vec` based `MemoryEntity` needs to deallocate
}

impl MemoryEntity {
    pub fn data(&self) -> &[u8] {
        slice::from_raw_parts(self.ptr, self.length)
    }

    pub fn data_mut(&mut self) -> &mut [u8] {
        slice::from_raw_parts_mut(self.ptr, self.length)
    }
}

impl Drop for MemoryEntity {
    fn drop(&mut self) {
        if self.static {
            return
        }
        // The next line allows us to properly deallocate the internal byte buffer.
        _ = Vec::from_raw_parts(self.ptr, self.length, self.capacity);
    }
}

This design allows us to have no regressions for accessing the underlying byte buffer since there are no checks for data or data_mut. The only inconvenience is that MemoryEntity::grow needs to be treated differently for both buffer backends. The only user visible change with this design is that we are going to need another Memory constructor to allow to create MemoryEntity with a static byte buffer:

impl Memory {
    pub unsafe fn new_static(
        mut ctx: impl AsContextMut,
        ty: MemoryType,
        buffer: &'static mut [u8]
    ) -> Result<Self, MemoryError> {
        // etc...
        // calls `MemoryEntity::new_static` internally
    }
}

impl MemoryEntity {
    pub unsafe fn new_static(
        memory_type: MemoryType,
        buffer: &'static mut [u8],
        limiter: &mut ResourceLimiterRef<'_>,
    ) -> Result<Self, MemoryError> {
        // etc...
    }
}

With this information in mind (hopefully it works) you should be able to craft an experimental PR and see if it works to your expectations. Afterwards we can try to bring it in shape for an eventual merge, no promises though.

Note that those functions are marked unsafe since generally &'static mut T usage is shared memory usage which is kinda a bad practise in Rust since borrow checking this thing is not really possible. You could for example create multiple MemoryEntity with the same &'static mut [u8] byte buffer and have overlapping memories and therefore no guarantees whatsoever about memory accesses being exclusive as guaranteed by the borrow checker. This is also the reason why I am not sure if I want to actually add this to Wasmi since ideally you'd add unsafe to all of its API since accessing data or data_mut or doing anything with the underlying byte buffer is actually not checked and therefore must be trusted.

Ddystopia commented 9 months ago

Making the function unsafe seems unnecessary, as &mut references are inherently unique. Any situation where two &mut references point to the same memory would already result in undefined behavior well before any interaction with wasmi occurs. Also you can gen &'static mut reference from Box::leak, totally safe.

Also I am not sure if it needed to have MemoryEntity like this:

struct MemoryEntity {
    ptr: *mut u8,
    length: usize,
    capacity: usize,
    memory_type: MemoryType,
    static: bool,
}

Rust's compiler optimizations might make following code the same in terms of speed, so conducting benchmarks should make a final decision:

// size_of::<ActualStorage>() == size_of::<Vec<u8>>()
enum ActualStorage {
   Vec(Vec<u8>),
   Static(&'static mut [u8])
}

struct MemoryEntity {
    storage: ActualStorage,
    memory_type: MemoryType,
    current_pages: Pages,
}

impl MemoryEntity {
    pub fn data(&self) -> &[u8] {
        match &self.storage {
            Vec(it) => &it[..],
            Static(it) => &it[..],
        }
    }

    pub fn data_mut(&self) -> &mut [u8] {
        match &mut self.storage {
            Vec(it) => &mut it[..],
            Static(it) => &mut it[..],
        }
    }

    pub fn grow(
        &mut self,
        additional: Pages,
        limiter: &mut ResourceLimiterRef<'_>,
    ) -> Result<Pages, EntityGrowError> {
       let ActualMemory::Vec(vec) = &mut self.storage else {
           return Err(EntityGrowError);
        };
        // ...
    }
}

impl Drop for MemoryEntity {
    fn drop(&mut self) {
       // ...
    }
}

Robbepop commented 9 months ago

Rust's compiler optimizations might make following code the same in terms of speed

That's not possible since in the enum case the Rust compiler has to make a conditional check for the enum discriminant every single time whereas in my example this check is avoided entirely for the commonly used data and data_mut methods. I usually take a look at the online compiler explorer for Rust to see codegen effects for things like these. You are generally right that benchmarks should be conducted before implementing anything. Feel free to do so.

Robbepop commented 9 months ago

@Ddystopia @elrafoon is there still interest in doing an experimental implementation for this feature? Otherwise I'd like to close this issue.

Ddystopia commented 9 months ago

Yes, there is. I will submit a PR in the near future. It's just that there are higher priorities right now.

wasmi-labs / wasmi

Support for Alternative Memory Backing Options #878