tessi / wasmex

Execute WebAssembly from Elixir
MIT License
549 stars 32 forks source link

Arguments and returns as a string #249

Closed sleipnir closed 3 years ago

sleipnir commented 3 years ago

First of all I would like to thank you for this excellent library. I need to call a function that takes a string as an argument and that returns a string as a result. In the examples I only saw the use of one or the other, never passing and receiving strings. It may seem trivial to expand, but I'm a little confused about the WASM API and the use of Memory to achieve this goal. Could someone help me?

tessi commented 3 years ago

Hey @sleipnir thanks for asking so nicely 💚

Passing and receiving memory really isn't as easy as I wish it was. So, sadly, I don't have an easy solution for you, but a lengthy post. Hope this still helps (and if helpful, could be something I point others to who have the same question). The good news is, that there is a silver lining (see the footnote).


Handling strings is not easy because WebAssembly does not know "strings" (as we understand them in elixir), but only sees "a bunch of bytes". This means, when looking at our WebAssembly memory, the whole memory is just "a big array of bytes". So we need to know where to start reading the string from memory (memory_position aka pointer) and how many bytes to read (length).

There is two ways to pass this information:

  1. The old-school "C language" way: Every string ends with a zero-byte ("\0") and we read the memory starting from our pointer until we discover a zero byte. The upside is that we only need to pass one argument to/from WebAssembly (the pointer). But the downside is that it opens the door wide for mistakes (what if we forgot a zero byte, what if our string contains zero-bytes, what if an attacker manages to sneak-in or remove some zero-bytes etc.).
  2. We always pass two arguments, the pointer and length (in bytes) of a string. This has the upside of being way more secure (against attackers and programming mistakes) but the downside of us needing to always pass two variables to/from WebAssembly. This is, by the way, how Rust handles Strings internally.

For wasmex, we decided to go with the second approach. This works well for passing Strings down to WebAssembly. Given we have this method in WebAssembly (implemented in Rust, but could be any language compiling to wasm):

#[no_mangle]
pub extern "C" fn do_something_with_a_string(bytes: *const u8, length: usize) -> u8 {
    // do something with the given byte-array and string length
}

We could call the method in Elixir like this:

{:ok, memory} = Wasmex.memory(instance, :uint8, 0)
string = "hello, world"
memory_position = 42 # aka "pointer"
Wasmex.Memory.write_binary(memory, memory_position, string) # copy the bytes to WASM memory, so our WASM function can see it

Wasmex.call_function(instance, :do_something_with_a_string, [memory_position, String.length(string)])

The other way around (handing a string from WebAssembly back to Elixir) is more complicated, because we can currently only return one value from wasm functions, but we need two values (pointer and length). (see 👣 footnotes for details)

This is often solved by having two functions in WebAssembly: One function producing a string, and another function returning the string-length.

#[no_mangle]
pub extern "C" fn demo_string() -> *const u8 {
    b"Hello, World!".as_ptr()
}

#[no_mangle]
pub extern "C" fn demo_string_len() -> u8 {
    13
}

We would use it in Elixir like this:

{:ok, [pointer]} = Wasmex.call_function(instance, :demo_string, [])
{:ok, [length]} = Wasmex.call_function(instance, :demo_string_len, [])
assert Wasmex.Memory.read_string(memory, pointer, length) == "Hello, World!"

Now that you know how to pass strings down to WebAssembly and back to Elixir again, there should be nothing stopping you from combining both approaches.

If your specific use-case requires it, you can of course go the "C strings" way of ending strings with zero-bytes or invent any other custom protocol (e.g. starting a string, so the first byte is the string-length). These customs routes don't have helper methods, though, in wasmex.

👣 Footnote: "wasm can only return one value from a function call"

This is only half-true. In fact, the WebAssembly standard already allows returning multiple values. Wasmer, the web assembly execution engine we use, already partially implements that. Unfortunately, we don't have that feature yet in singlepass compilation (see https://docs.wasmer.io/ecosystem/wasmer/wasmer-features). Once wasmer supports multi-value returns everywhere, we can build better helper methods in elixir to make the whole process way easier.

sleipnir commented 3 years ago

Hello, thanks for the detailed information.

Reading your answer I think I was not entirely clear about my use case and maybe I should try to explain it correctly. I need to call functions from wasm modules with arbitrary types and receive arbitrary types. These Wasm modules would be developed by third parties that would be implementing a protocol for my application, so they would be loaded and executed dynamically by my implementation. Unfortunately this is only possible with Webassembly if we use the Interface Types which is not yet fully supported by the largest wasm runtime (unless I know it). So I thought about using strings because, I could parse my arbitrary types (to be more precise, Protobuf types) to strings (protobuf can easily be used with json) and pass them to the functions and thus also be able to receive the result via string and then again transform them into protobuf types. I am aware that this approach would not be the most efficient but while we do not have support for interface types in Wasm I think it would be one of the few viable alternatives. Unfortunately, waiting for customers to write methods that return string length seems to complicate things for me and I don't know if that would be an option.

Maybe if I could read all the memory and knowing that I am getting arbitrary bytes I could directly transform the result into my Protobuf types (since protobufs use bytes directly) instead of working with Strings. Did I explain?

tessi commented 3 years ago

Alright, I think I understand better. You want to pass arbitrary info into WebAssembly and back up again using protobufs. You can deserialize your proto objects to json, but theoretically also to any string or byte sequence (to safe some space, or be more time efficient).

What about a custom byte serialization where you write the following to wasm memory

00 00 00 00 00 00 00 05 48 65 6c 6c 6f
|---------------------| |------------|
          |                  |
 size (64 bit unsigned int)  |
                             |
                          bytes, must be `size` bytes long

This way, you can pass one pointer to wasm where the first 8 bytes encode the size of the following byte array. The following byte array could contain raw bytes (probably most efficient in your case) or json (better debugging as you can read the content).

The example above, should decode to a string of size 5 containing the byte values for "Hello".

Since the "header" part containing the string size is fixed-size it's hopefully easy do de-/serialize this format. What do you think?

sleipnir commented 3 years ago

@tessi Thank you for your kind and complete answer, I think it might be worth a try, I just had doubts about how it would look using Wasmex? Sorry if I didn't fully understand.

tessi commented 3 years ago

I am not exactly sure if it fits your use case, but I sketched a wrapper module together that wraps Wasmex.Memory to implement the protocol mentioned above:

@moduledoc """
  Assuming we have a :uint8-type memory, this module offers ways to write and read binaries
  to/from wasm memory.

  Binaries are written in two parts:

  1. size of the binary (8 bytes, unsigned int, big endian)
  2. binary content (having exactly the number of bytes given in `size`)

  We do *not* ensure that writing/reading memory fits the wasm memory bounds.
  """
defmodule WasmBinaryTrampoline do
  def write_binary(memory, index, binary) when is_binary(binary) do
    length = byte_size(binary)
    length_bytes = <<length::64-big>> # 8 bytes, big endian

    Wasmex.Memory.write_binary(memory, index, length_bytes)
    Wasmex.Memory.write_binary(memory, index + 8, binary)
  end

  def read_binary(memory, index) do
    length = memory
             |> Wasmex.Memory.read_binary(index, 8)
             |> :binary.decode_unsigned(:big)

    Wasmex.Memory.read_binary(memory, index + 8, length)
  end
end

Note: Name of the module might be a little weird (I'm open for suggestions here ;)) and I just tested it briefly.

If you open iex inside the wasmex repository root directory, the following should work

bytes = File.read!("test/wasm_test/target/wasm32-unknown-unknown/debug/wasmex_test.wasm")
{:ok, instance } = Wasmex.start_link(bytes)
{:ok, memory} = Wasmex.memory(instance, :uint8, 0)

WasmBinaryTrampoline.write_binary(memory, 0, "Hello") # :ok
WasmBinaryTrampoline.read_binary(memory, 0) # "Hello"

# just a test to see what we actually wrote to memory,
# we see the first 8 bytes being the size followed by actual content
Wasmex.Memory.read_binary(memory, 0, 8 + 5) # <<0, 0, 0, 0, 0, 0, 0, 5, 72, 101, 108, 108, 111>>

Hope it helps you. Also: if whatever you are building is open source (actually also if not) and you want to tell, I'd be very interested in what you are using wasmex for. It really motivates me to hear what this library is used for.

sleipnir commented 3 years ago

HI @tessi This is awesome. I am really happy with all your attention to my question. I will try to use the module.

Yes, we have two projects one already open but still very, very early (WIP). And another one project that we plan to open the code soon. I leave here the link bellow of the project already open, it is a PubSub message Broker based on gRPC, the idea of using Wasm is to connect it to topics and be able to execute "Serveless Functions" (running the risk of sounding cliché). The code that will use Wasm has not yet been merged with the main branch, as I said, it is still a work in progress, but we have already managed to connect producers to consumers and send and receive messages.

https://github.com/eigr/Astreu

tessi commented 3 years ago

Very cool :) I wish you best of luck and success with Astreu! I'm very interested if things work out for you.

Anyways, considering this issue, I think we're done. Please re-open or create a new issue if you discover any bugs or weirdnesses.

sleipnir commented 3 years ago

Again I thank you for your attention and I will keep you informed. Unfortunately I was very busy with other things today and I haven't been able to test it yet, but I believe it will work accordingly and again I will keep you informed

sleipnir commented 3 years ago

Hello @tessi I managed to reproduce the example perfectly. But I'm still not used to the Wasmex API enough to feel safe moving forward. I still have a doubt. How is the call to a function written in wasm that accepts certain parameters like:

pub extern "C" fn string_receive_and_result_bytes (bytes: * const u8) -> *const u8 {

I can read and write in memory but the call_function API (Wasmex.call_function(instance, "string_receive_and_result_bytes", [])) still asks for parameters that I must pass to the wasm function. How do I do that? Sorry if the answer can be very obvious but I still have this difficulty in understanding how the parameters are passed to the function.

tessi commented 3 years ago

No worries asking :)

You wrote your string to memory, but (as you may see in your function signature) calling that function needs a param. It wants to have the pointer to the place in memory where you stored the string.

So if you did:

in_str_pointer = 0 # you can make this one up. if there is nothing else in memory, `0` is a good value. otherwise be careful not to overwrite existing data in memory.
in_str = "Hello World"
:ok = WasmBinaryTrampoline.write_binary(memory, in_str_pointer, in_str)

0 is the pointer (the very first byte in memory). If you use any other pointer (because there may already be other stuff in memory at the early bytes, you need to use a different pointer).

Calling the function would be

{:ok, [out_str_pointer]} = Wasmex.call_function(instance, :string_receive_and_result_bytes, [in_str_pointer])
out_str = WasmBinaryTrampoline.read_binary(memory, out_str_pointer)
sleipnir commented 3 years ago

Thanks again for the excellent response, now all the pieces fit together perfectly, I had not noticed that the parameter is just a pointer to the memory