wasm3 / wasm3

🚀 A fast WebAssembly interpreter and the most universal WASM runtime
https://twitter.com/wasm3_engine
MIT License
7.28k stars 459 forks source link

Help: Passing strings to host function in C #340

Closed konsumer closed 2 years ago

konsumer commented 2 years ago

I am new to using wasm3 & not great with C, so forgive a dumb question. I have asked around in my circles of people who do C stuff, and no one seems to know how to fix it.

I have a very simple wasm, that works fine in js (browser and node) and I can't seem to make it work right in C.

In this example, I want a very simple log function that can log strings.

Here is the assemblyscript:

@external("env", "null0_log")
declare function log(message: string): void

export function init(): void  {
  log("init")
}

Here is the wat:

(module
 (type $i32_=>_none (func (param i32)))
 (type $none_=>_none (func))
 (import "env" "null0_log" (func $example/index/log (param i32)))
 (memory $0 1)
 (data (i32.const 1036) "\1c")
 (data (i32.const 1048) "\01\00\00\00\08\00\00\00i\00n\00i\00t")
 (export "init" (func $example/index/init))
 (export "memory" (memory $0))
 (func $example/index/init
  i32.const 1056
  call $example/index/log
 )
)

In C, it only outputs the first character, with this:

static m3ApiRawFunction (null0_log) {
  m3ApiGetArgMem(const char*, str);
  printf("%s", str);
  m3ApiSuccess();
}

The strlen of str is 1, too. Here is the complete code.

Here is a working loader in nodejs:

import { readFile } from 'fs/promises'

const env = {
  null0_log: s => console.log(__liftString(s))
}

const { instance, module } = await WebAssembly.instantiate(await readFile('example/build/simple.wasm'), { env })

// this essentially came from the generated assemblyscript wrapper
function __liftString(pointer) {
  if (!pointer) return null;
  const
    end = pointer + new Uint32Array(instance.exports.memory.buffer)[pointer - 4 >>> 2] >>> 1,
    memoryU16 = new Uint16Array(instance.exports.memory.buffer);
  let
    start = pointer >>> 1,
    string = "";
  while (end - start > 1024) string += String.fromCharCode(...memoryU16.subarray(start, start += 1024));
  return string + String.fromCharCode(...memoryU16.subarray(start, end));
}

instance.exports.init()

I think the part I am missing is how to make __liftString in C. Doesn't m3ApiGetArgMem do this?

konsumer commented 2 years ago

It appears that it's interleaved with 0s, so this works:

static m3ApiRawFunction (null0_log) {
  m3ApiGetArgMem(const char*, str);

  for( int i = 0; i < 1024; i+=2 ){
    printf( "%c", *( str + i ) );
    if( *( str + i ) == '\0' ){
      printf( "\n" );
      break;
    }
  }

  m3ApiSuccess();
}

but there is probably a more proper C way to do this that I don't know about.

MaxGraey commented 2 years ago

Something like this in C++:

m3ApiRawFunction (m3_log) {
    m3ApiGetArgMem  (const uint32_t*, ptr)
    uint32_t lenInPoints = *(ptr - 1) / 2;

    std::u16string strUtf16(reinterpret_cast<const char16_t*>(ptr), 0, lenInPoints);
    std::wstring_convert<std::codecvt_utf8<char16_t>, char16_t> converter;
    std::cout << converter.to_bytes(strUtf16) << std::endl;

    m3ApiSuccess();
}

For plain C probably need a some extra library which can convert utf16 LE to utf8

konsumer commented 2 years ago

Ah, nice. Thanks!

I ended up using this in C. It's probly not the best solution, but it seems to work:

void null0_string(const char* str, char* out) {
  int i;
  for (i = 0; i < sizeof(out); i+=2) {
    out[i/2] =  *( str + i );
  }
  out[(i/2) + 1] = '\0';
}

// usage example
static m3ApiRawFunction (null0_log) {
  m3ApiGetArgMem(const char*, _stext);
  char text[1024];
  null0_string(_stext, text);
  printf(text);
  m3ApiSuccess();
}

Getting length with *(ptr - 1) * 2 seems handy, so that would probly be better than getting size from output-var.

I am generating a few things from a single interface-description, and once I got this, I think most stuff works great. wasm3 rocks!

I will close now. Thanks for taking a look.

MaxGraey commented 2 years ago
void null0_string(const char* str, char* out, uint32_t len) {
  uint32_t i;
  for (i = 0; i < len; i++) {
    out[i] = *(str + i*2);
  }
  out[len + 1] = '\0';
}

// usage example
static m3ApiRawFunction (null0_log) {
  m3ApiGetArgMem(const char*, in);
  char out[1024];
  null0_string(in, out, *(uint32_t*)(in - 4) / 2);
  printf(out);
  m3ApiSuccess();
}

But this doesn't support unicode

konsumer commented 2 years ago

A friend in raylib discord wrote this and it seems to work better than my lil util.

konsumer commented 1 year ago

I eventually ended up using assemblyscript's encoder, so other wasm compilers can use regular null-terminated utf8 in the same host:

// log a string
@external("env", "null0_log")
declare function null0_log(text: ArrayBuffer): void
export function log(text: string): void {
  null0_log(String.UTF8.encode(text, true))
}
spangaer commented 1 month ago

Should anyone ever run in to this with wasm3-rs, these 2 tickets were instrumental in figuring it out.

Documenting it for those that come after me:

Project 1, compiles to wasm32-unknown-unknown lib.rs contains

use std::ffi;

extern "C" {
    pub fn print_str_impl(msg: *const ffi::c_char);
}

Obviously you need to do the standard ffi conversions of str's to invoke it. It needs to be invoked not to be removed by the optimizer.

Project 2, loads the wasm using wasm3-rs main.rs

use std::ffi;

fn main() {

// ... 
    module
        .link_closure("env", "print_str_impl", |ctx, input: i32| {
            let input = input as usize;
            let msg = unsafe {
                let string_start: *const u8 = &(*ctx.memory())[input];
                ffi::CStr::from_ptr(string_start as *const ffi::c_char)
            };
            let _ = msg.to_str().inspect(|str| println!("{}", str));
            Ok(())
        })
        .expect("Unable to link function");

// ... 
}

So what comes in as a pointer is actually an index in the wasm memory blob. Getting back the string is a matter of indexing the blob and bring that real pointer back to a Rust str, ffi style.

(of course I'm saved here by the Rust input side being sure to encode the C-string UTF-8).