malcolmstill / zware

Zig WebAssembly Runtime Engine
MIT License
274 stars 10 forks source link

A little help about passing string back and forth #222

Open jeromepin opened 2 months ago

jeromepin commented 2 months ago

First, thank you for this library, it works great and the user-facing code is quite easy to grasp, even for a zig and wasm newcomer as myself.

I would like to get help on how to pass strings (and therefore, serialized data structure) back and forth between the host and the module.

I understand the only way is to put the string into the memory and pass addresses and offsets to tell the module where to find the string, but I can't seem to achieve it.

I tried with Instance.getMemory() and Memory.write() but the write function only handle numbers :( I believe I should be able to write from zware into the DATA section and then... I'm not sure and I'm lost.

malcolmstill commented 2 months ago

Hi @jeromepin,

Sorry for the delay in replying, I've been in Milan all week for https://sycl.it and not had a chance to take a look at the discussion/ issue.

Does Memory.copy() fulfil your needs?

If that doesn't do what you need, I can help further, but in that case it would be useful to know what language you are compiling to wasm and if there was some illustrative example of what you are trying / expecting and I can hopefully point you in the right direction.

the user-facing code is quite easy to grasp, even for a zig and wasm newcomer as myself

Glad to hear it, but if you do find things that would make it easier please let me know!

jeromepin commented 2 months ago

Hi @malcolmstill,

I'm the one who is sorry. My impatience got the better of me, and I should definitely have waited longer. I apologize.

Memory.copy() seems interesting, I'll have to test it ! I'm not sure how you use it though. How do you find the address you can use ? Then I give the address to invoke right ? And how the wasm program can access the data segment ? I mean wasm's specifications documents it clearly, but what about the language that will be compiled to wasm ?

it would be useful to know what language you are compiling to wasm

I'm using Zig too. But I thought it wasn't relevant, since every languages should create a wasm binary similarly, no ?

if there was some illustrative example of what you are trying / expecting and I can hopefully point you in the right direction.

For now I'm toying around with everything. The end goal is to write some kind of query into any language, compile it to wasm, read the query from the host language and apply it on a datasource. Then pass the results back to the wasm binary. Hence the need to be able to serialize and pass things back and forth.

malcolmstill commented 2 months ago

@jeromepin

I apologize.

Absolutely no need to apologise!

I'm using Zig too. But I thought it wasn't relevant, since every languages should create a wasm binary similarly, no ?

So fundamentally this sort of operation at the lowest level will likely amount to a Memory.copy() given some source data (which will give you the length) and a destination pointer.

However, you could potentially have different encodings expected host (zig) side vs the target language. E.g. Rust's string type is utf-8 encoded but I think AssemblyScript's string type is utf-16.

But the more obvious issue...as you are asking...is where in the webassembly module's memory do we put the string? We can't put it just anywhere because we may then corrupt memory if that slice is already in use. Note: perhaps in very simple scenarios there may be nothing else in the linear memory so you could start at offset 0 or having some fixed location in memory for a specific string would tell you where to put it statically, but in general you need to do a bit more work.

Let's maybe look at how rust does it with a javascript host. We define a function:

#[wasm_bindgen]
pub fn name_length(name: &str) -> usize {
    return name.len();
}

And build it with some of the standard rust webassembly tooling (wasm-bindgen, wasm-pack) we'll get a function name_length(name) that is callable from javascript.

However, wasm-bindgen is doing a lot of work to make this bridge simple for us, because under the hood, the javascript function name_length is actually:

export function name_length(name) {
    const ptr0 = passStringToWasm0(name, wasm.__wbindgen_malloc, wasm.__wbindgen_realloc);
    const len0 = WASM_VECTOR_LEN;
    const ret = wasm.name_length(ptr0, len0);
    return ret >>> 0;
}

where passStringToWasm0 is defined as:

function passStringToWasm0(arg, malloc, realloc) {

    if (realloc === undefined) {
        const buf = cachedTextEncoder.encode(arg);
        const ptr = malloc(buf.length, 1) >>> 0;
        getUint8Memory0().subarray(ptr, ptr + buf.length).set(buf);
        WASM_VECTOR_LEN = buf.length;
        return ptr;
    }

    let len = arg.length;
    let ptr = malloc(len, 1) >>> 0;

    const mem = getUint8Memory0();

    let offset = 0;

    for (; offset < len; offset++) {
        const code = arg.charCodeAt(offset);
        if (code > 0x7F) break;
        mem[ptr + offset] = code;
    }

    if (offset !== len) {
        if (offset !== 0) {
            arg = arg.slice(offset);
        }
        ptr = realloc(ptr, len, len = offset + arg.length * 3, 1) >>> 0;
        const view = getUint8Memory0().subarray(ptr + offset, ptr + len);
        const ret = encodeString(arg, view);

        offset += ret.written;
        ptr = realloc(ptr, len, offset, 1) >>> 0;
    }

    WASM_VECTOR_LEN = offset;
    return ptr;
}

And you see the actual compiled webassembly function name_length doesn't actually take &str it takes two integers ptr0 and len0.

So what's happening here is that the rust webassembly module is also exposing a malloc function. On the javascript side it calls wasm.__wbindgen_malloc with some length, which will ask the allocator compiled into the webassembly module to reserve some space in the heap and then copy the string into that area. Now that we have a pointer to that area it can call wasm.name_length(ptr0, len0). In other words, a relatively simple operation is actually quite involved, but wasm-bindgen is hiding a lot of that complexity.

Now in zig I don't believe there is anything at the moment like wasm-bindgen, so it requires a more hands on approach. And even if there were a wasm-bindgen like thing, that would be specifically for javascript as the host. And in this case zware is the host.

So, how do we pass a string from zig / zware (host) to zig (wasm). I think what you'll need to do is initialise within the zig (wasm) code an allocator and then expose this to first allocate space for the string and get a ptr back. Then you can pass this ptr to the actually string processing function that have some signature that looks like fn someStringFunction(ptr: [*]u8, len: usize) with invoke as you say.

Or again, maybe you can start with a fixed location in memory?

Does that help at all? Again if you've got some little example that you want to achieve, post the code and we'll see if we can get it working.

References

jeromepin commented 1 month ago

Oooh boy ! In what I got myself into... I'm in over my head here.

Ok, first thank you so much for the detailed answer, this is very kind of you to take so much time on these questions.

I will need to dig into this and take time to understand everything.

I don't know how you want to manage your issues, but you can move this to a discussion if you prefer, to not clutter the Issues page 🤷‍♂️ I definitely hope I will be able to try this out, but it could take weeks or even months

malcolmstill commented 1 month ago

Couple of thoughts on issues to spin out:

I don't know how you want to manage your issues, but you can move this to a discussion if you prefer, to not clutter the Issues page 🤷‍♂️ I definitely hope I will be able to try this out, but it could take weeks or even months

We have few enough issues that I don't particularly mind...I'll just leave it here for the moment.