Closed sleipnir closed 3 years ago
Hey @sleipnir thanks for asking so nicely 💚
Passing and receiving memory really isn't as easy as I wish it was. So, sadly, I don't have an easy solution for you, but a lengthy post. Hope this still helps (and if helpful, could be something I point others to who have the same question). The good news is, that there is a silver lining (see the footnote).
Handling strings is not easy because WebAssembly does not know "strings" (as we understand them in elixir), but only sees "a bunch of bytes". This means, when looking at our WebAssembly memory, the whole memory is just "a big array of bytes". So we need to know where to start reading the string from memory (memory_position
aka pointer
) and how many bytes to read (length
).
There is two ways to pass this information:
"\0"
) and we read the memory starting from our pointer
until we discover a zero byte. The upside is that we only need to pass one argument to/from WebAssembly (the pointer
). But the downside is that it opens the door wide for mistakes (what if we forgot a zero byte, what if our string contains zero-bytes, what if an attacker manages to sneak-in or remove some zero-bytes etc.).pointer
and length
(in bytes) of a string. This has the upside of being way more secure (against attackers and programming mistakes) but the downside of us needing to always pass two variables to/from WebAssembly. This is, by the way, how Rust handles String
s internally.For wasmex, we decided to go with the second approach. This works well for passing Strings down to WebAssembly. Given we have this method in WebAssembly (implemented in Rust, but could be any language compiling to wasm):
#[no_mangle]
pub extern "C" fn do_something_with_a_string(bytes: *const u8, length: usize) -> u8 {
// do something with the given byte-array and string length
}
We could call the method in Elixir like this:
{:ok, memory} = Wasmex.memory(instance, :uint8, 0)
string = "hello, world"
memory_position = 42 # aka "pointer"
Wasmex.Memory.write_binary(memory, memory_position, string) # copy the bytes to WASM memory, so our WASM function can see it
Wasmex.call_function(instance, :do_something_with_a_string, [memory_position, String.length(string)])
The other way around (handing a string from WebAssembly back to Elixir) is more complicated, because we can currently only return one value from wasm functions, but we need two values (pointer
and length
). (see 👣 footnotes for details)
This is often solved by having two functions in WebAssembly: One function producing a string, and another function returning the string-length.
#[no_mangle]
pub extern "C" fn demo_string() -> *const u8 {
b"Hello, World!".as_ptr()
}
#[no_mangle]
pub extern "C" fn demo_string_len() -> u8 {
13
}
We would use it in Elixir like this:
{:ok, [pointer]} = Wasmex.call_function(instance, :demo_string, [])
{:ok, [length]} = Wasmex.call_function(instance, :demo_string_len, [])
assert Wasmex.Memory.read_string(memory, pointer, length) == "Hello, World!"
Now that you know how to pass strings down to WebAssembly and back to Elixir again, there should be nothing stopping you from combining both approaches.
If your specific use-case requires it, you can of course go the "C strings" way of ending strings with zero-bytes or invent any other custom protocol (e.g. starting a string, so the first byte is the string-length). These customs routes don't have helper methods, though, in wasmex.
This is only half-true. In fact, the WebAssembly standard already allows returning multiple values. Wasmer
, the web assembly execution engine we use, already partially implements that. Unfortunately, we don't have that feature yet in singlepass compilation (see https://docs.wasmer.io/ecosystem/wasmer/wasmer-features). Once wasmer supports multi-value returns everywhere, we can build better helper methods in elixir to make the whole process way easier.
Hello, thanks for the detailed information.
Reading your answer I think I was not entirely clear about my use case and maybe I should try to explain it correctly. I need to call functions from wasm modules with arbitrary types and receive arbitrary types. These Wasm modules would be developed by third parties that would be implementing a protocol for my application, so they would be loaded and executed dynamically by my implementation. Unfortunately this is only possible with Webassembly if we use the Interface Types which is not yet fully supported by the largest wasm runtime (unless I know it). So I thought about using strings because, I could parse my arbitrary types (to be more precise, Protobuf types) to strings (protobuf can easily be used with json) and pass them to the functions and thus also be able to receive the result via string and then again transform them into protobuf types. I am aware that this approach would not be the most efficient but while we do not have support for interface types in Wasm I think it would be one of the few viable alternatives. Unfortunately, waiting for customers to write methods that return string length seems to complicate things for me and I don't know if that would be an option.
Maybe if I could read all the memory and knowing that I am getting arbitrary bytes I could directly transform the result into my Protobuf types (since protobufs use bytes directly) instead of working with Strings. Did I explain?
Alright, I think I understand better. You want to pass arbitrary info into WebAssembly and back up again using protobufs. You can deserialize your proto objects to json, but theoretically also to any string or byte sequence (to safe some space, or be more time efficient).
What about a custom byte serialization where you write the following to wasm memory
00 00 00 00 00 00 00 05 48 65 6c 6c 6f
|---------------------| |------------|
| |
size (64 bit unsigned int) |
|
bytes, must be `size` bytes long
This way, you can pass one pointer to wasm where the first 8 bytes encode the size of the following byte array. The following byte array could contain raw bytes (probably most efficient in your case) or json (better debugging as you can read the content).
The example above, should decode to a string of size 5
containing the byte values for "Hello".
Since the "header" part containing the string size is fixed-size it's hopefully easy do de-/serialize this format. What do you think?
@tessi Thank you for your kind and complete answer, I think it might be worth a try, I just had doubts about how it would look using Wasmex? Sorry if I didn't fully understand.
I am not exactly sure if it fits your use case, but I sketched a wrapper module together that wraps Wasmex.Memory
to implement the protocol mentioned above:
@moduledoc """
Assuming we have a :uint8-type memory, this module offers ways to write and read binaries
to/from wasm memory.
Binaries are written in two parts:
1. size of the binary (8 bytes, unsigned int, big endian)
2. binary content (having exactly the number of bytes given in `size`)
We do *not* ensure that writing/reading memory fits the wasm memory bounds.
"""
defmodule WasmBinaryTrampoline do
def write_binary(memory, index, binary) when is_binary(binary) do
length = byte_size(binary)
length_bytes = <<length::64-big>> # 8 bytes, big endian
Wasmex.Memory.write_binary(memory, index, length_bytes)
Wasmex.Memory.write_binary(memory, index + 8, binary)
end
def read_binary(memory, index) do
length = memory
|> Wasmex.Memory.read_binary(index, 8)
|> :binary.decode_unsigned(:big)
Wasmex.Memory.read_binary(memory, index + 8, length)
end
end
Note: Name of the module might be a little weird (I'm open for suggestions here ;)) and I just tested it briefly.
If you open iex
inside the wasmex repository root directory, the following should work
bytes = File.read!("test/wasm_test/target/wasm32-unknown-unknown/debug/wasmex_test.wasm")
{:ok, instance } = Wasmex.start_link(bytes)
{:ok, memory} = Wasmex.memory(instance, :uint8, 0)
WasmBinaryTrampoline.write_binary(memory, 0, "Hello") # :ok
WasmBinaryTrampoline.read_binary(memory, 0) # "Hello"
# just a test to see what we actually wrote to memory,
# we see the first 8 bytes being the size followed by actual content
Wasmex.Memory.read_binary(memory, 0, 8 + 5) # <<0, 0, 0, 0, 0, 0, 0, 5, 72, 101, 108, 108, 111>>
Hope it helps you. Also: if whatever you are building is open source (actually also if not) and you want to tell, I'd be very interested in what you are using wasmex for. It really motivates me to hear what this library is used for.
HI @tessi This is awesome. I am really happy with all your attention to my question. I will try to use the module.
Yes, we have two projects one already open but still very, very early (WIP). And another one project that we plan to open the code soon. I leave here the link bellow of the project already open, it is a PubSub message Broker based on gRPC, the idea of using Wasm is to connect it to topics and be able to execute "Serveless Functions" (running the risk of sounding cliché). The code that will use Wasm has not yet been merged with the main branch, as I said, it is still a work in progress, but we have already managed to connect producers to consumers and send and receive messages.
Very cool :) I wish you best of luck and success with Astreu! I'm very interested if things work out for you.
Anyways, considering this issue, I think we're done. Please re-open or create a new issue if you discover any bugs or weirdnesses.
Again I thank you for your attention and I will keep you informed. Unfortunately I was very busy with other things today and I haven't been able to test it yet, but I believe it will work accordingly and again I will keep you informed
Hello @tessi I managed to reproduce the example perfectly. But I'm still not used to the Wasmex API enough to feel safe moving forward. I still have a doubt. How is the call to a function written in wasm that accepts certain parameters like:
pub extern "C" fn string_receive_and_result_bytes (bytes: * const u8) -> *const u8 {
I can read and write in memory but the call_function API (Wasmex.call_function(instance, "string_receive_and_result_bytes", [])) still asks for parameters that I must pass to the wasm function. How do I do that? Sorry if the answer can be very obvious but I still have this difficulty in understanding how the parameters are passed to the function.
No worries asking :)
You wrote your string to memory, but (as you may see in your function signature) calling that function needs a param. It wants to have the pointer to the place in memory where you stored the string.
So if you did:
in_str_pointer = 0 # you can make this one up. if there is nothing else in memory, `0` is a good value. otherwise be careful not to overwrite existing data in memory.
in_str = "Hello World"
:ok = WasmBinaryTrampoline.write_binary(memory, in_str_pointer, in_str)
0
is the pointer (the very first byte in memory). If you use any other pointer (because there may already be other stuff in memory at the early bytes, you need to use a different pointer).
Calling the function would be
{:ok, [out_str_pointer]} = Wasmex.call_function(instance, :string_receive_and_result_bytes, [in_str_pointer])
out_str = WasmBinaryTrampoline.read_binary(memory, out_str_pointer)
Thanks again for the excellent response, now all the pieces fit together perfectly, I had not noticed that the parameter is just a pointer to the memory
First of all I would like to thank you for this excellent library. I need to call a function that takes a string as an argument and that returns a string as a result. In the examples I only saw the use of one or the other, never passing and receiving strings. It may seem trivial to expand, but I'm a little confused about the WASM API and the use of Memory to achieve this goal. Could someone help me?