rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
97.21k stars 12.56k forks source link

wasm32-wasip1 depends on libc memset with no_std #130621

Open drebbe-intrepid opened 1 week ago

drebbe-intrepid commented 1 week ago

I've looked high and low between the WASM/WASI specifications and can't find what is the "correct" behavior here but the current rust behavior seems wrong to me.

I don't believe we should be trying to import memset from the "env" module:

(func $import0 (import "env" "memset") (param i32 i32 i32) (result i32))

wasm3 engine can't run this code either due to this:

$ wasm3 target/wasm32-wasip1/release/wasm_br_test.wasm 
Error: missing imported function ('env.memset')

Code

#![no_std]
#![no_main]

#[no_mangle]
pub fn _start() {
    let _asdf = [0; 40];
}

#[panic_handler]
fn panic(_info: &core::panic::PanicInfo) -> ! {
    loop {}
}

Cargo.toml

[lib]
crate-type = ["cdylib"]

[profile.release]
lto = true
#opt-level = 's'
opt-level = 0
codegen-units = 1
panic = "abort"
strip = true

.cargo/config.toml

[build]
target = "wasm32-wasip1"

[target.wasm32-wasip1]
rustflags = ["-C", "link-arg=-zstack-size=65520",]

Meta

rustc --version --verbose:

rustc 1.83.0-nightly (f79a912d9 2024-09-18)
binary: rustc
commit-hash: f79a912d9edc3ad4db910c0e93672ed5c65133fa
commit-date: 2024-09-18
host: x86_64-unknown-linux-gnu
release: 1.83.0-nightly
LLVM version: 19.1.0

wasm generated binary

(module
  (type $type0 (func (param i32 i32 i32) (result i32)))
  (type $type1 (func))
  (func $import0 (import "env" "memset") (param i32 i32 i32) (result i32))
  (table $table0 1 1 funcref)
  (memory $memory0 1)
  (global $global0 (mut i32) (i32.const 65520))
  (export "memory" (memory $memory0))
  (export "_start" (func $func1))
  (func $func1
    (local $var0 i32) (local $var1 i32) (local $var2 i32) (local $var3 i32) (local $var4 i32) (local $var5 i32) (local $var6 i32)
    global.get $global0
    local.set $var0
    i32.const 160
    local.set $var1
    local.get $var0
    local.get $var1
    i32.sub
    local.set $var2
    local.get $var2
    global.set $global0
    i32.const 160
    local.set $var3
    i32.const 0
    local.set $var4
    local.get $var2
    local.get $var4
    local.get $var3
    call $import0
    drop
    i32.const 160
    local.set $var5
    local.get $var2
    local.get $var5
    i32.add
    local.set $var6
    local.get $var6
    global.set $global0
    return
  )
)
alexcrichton commented 1 week ago

Have you tried compiling for the wasm32-unknown-unknown target? For the wasm32-wasip1 target memset comes from wasi-libc which you're disabling here through #![no_std]. For the wasm32-unknown-unknown target it comes through compiler-builtins which is always linked in.

drebbe-intrepid commented 1 week ago

Is there documentation anywhere on wasi-libc being included as part of rust or the wasi spec?

Here is wasm32-unknown-unknown which looks like it removes the memset call:

(module
  (type $type0 (func))
  (type $type1 (func (param i32 i32 i32) (result i32)))
  (table $table0 1 1 funcref)
  (memory $memory0 16)
  (global $global0 (mut i32) (i32.const 1048576))
  (global $global1 i32 (i32.const 1048576))
  (global $global2 i32 (i32.const 1048576))
  (export "memory" (memory $memory0))
  (export "_start" (func $func0))
  (export "__data_end" (global $global1))
  (export "__heap_base" (global $global2))
  (func $func0
    (local $var0 i32) (local $var1 i32) (local $var2 i32) (local $var3 i32) (local $var4 i32) (local $var5 i32) (local $var6 i32)
    global.get $global0
    local.set $var0
    i32.const 160
    local.set $var1
    local.get $var0
    local.get $var1
    i32.sub
    local.set $var2
    local.get $var2
    global.set $global0
    i32.const 160
    local.set $var3
    i32.const 0
    local.set $var4
    local.get $var2
    local.get $var4
    local.get $var3
    call $func1
    drop
    i32.const 160
    local.set $var5
    local.get $var2
    local.get $var5
    i32.add
    local.set $var6
    local.get $var6
    global.set $global0
    return
  )
  (func $func1 (param $var0 i32) (param $var1 i32) (param $var2 i32) (result i32)
    (local $var3 i32) (local $var4 i32) (local $var5 i32)
    block $label1
      block $label0
        local.get $var2
        i32.const 16
        i32.ge_u
        br_if $label0
        local.get $var0
        local.set $var3
        br $label1
      end $label0
      local.get $var0
      i32.const 0
      local.get $var0
      i32.sub
      i32.const 3
      i32.and
      local.tee $var4
      i32.add
      local.set $var5
      block $label2
        local.get $var4
        i32.eqz
        br_if $label2
        local.get $var0
        local.set $var3
        loop $label3
          local.get $var3
          local.get $var1
          i32.store8
          local.get $var3
          i32.const 1
          i32.add
          local.tee $var3
          local.get $var5
          i32.lt_u
          br_if $label3
        end $label3
      end $label2
      local.get $var5
      local.get $var2
      local.get $var4
      i32.sub
      local.tee $var4
      i32.const -4
      i32.and
      local.tee $var2
      i32.add
      local.set $var3
      block $label4
        local.get $var2
        i32.const 1
        i32.lt_s
        br_if $label4
        local.get $var1
        i32.const 255
        i32.and
        i32.const 16843009
        i32.mul
        local.set $var2
        loop $label5
          local.get $var5
          local.get $var2
          i32.store
          local.get $var5
          i32.const 4
          i32.add
          local.tee $var5
          local.get $var3
          i32.lt_u
          br_if $label5
        end $label5
      end $label4
      local.get $var4
      i32.const 3
      i32.and
      local.set $var2
    end $label1
    block $label6
      local.get $var2
      i32.eqz
      br_if $label6
      local.get $var3
      local.get $var2
      i32.add
      local.set $var5
      loop $label7
        local.get $var3
        local.get $var1
        i32.store8
        local.get $var3
        i32.const 1
        i32.add
        local.tee $var3
        local.get $var5
        i32.lt_u
        br_if $label7
      end $label7
    end $label6
    local.get $var0
  )
)
alexcrichton commented 1 week ago

Documentation not really, but that's sort of the defining feature of the wasip1 target is that it's using WASI APIs through wasi-libc. In that sense I suspect that the documentation you seek may not exist.

drebbe-intrepid commented 1 week ago

I would have not expected this behavior at all. Looks like wasm-unknown-unknown is the correct target for me but I didn't even realize wasi-libc was a thing until mentioned here. Its not even listed in the WASI-proposals

drebbe-intrepid commented 1 week ago

What would be the best place for this documentation?

drebbe-intrepid commented 1 week ago

Looks like clang++ does the same thing with similar c++ code.

extern "C" void _start() {
    int a[255] = {0};
}
$ clang++ --target=wasm32 -flto -nostdlib -Wl,--no-entry -Wl,--export-all -o test.wasm test.cpp
wasm-ld: error: lto.tmp: undefined symbol: memset
clang++: error: linker command failed with exit code 1 (use -v to see invocation)

$ clang++ --version
clang version 18.1.8
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
alexcrichton commented 1 week ago

As to where to document this, I'm not sure! The behavior you're describing here matches native platforms as well, for example

$ clang++ -flto -nostdlib  -o test.wasm test.cpp
/usr/bin/ld: /tmp/lto-llvm-c0452f.o: in function `_start':
ld-temp.o:(.text._start+0x1a): undefined reference to `memset'
clang++: error: linker command failed with exit code 1 (use -v to see invocation)

It's more common than not that libc provides memset, so I'm not sure where to document that in a way that's specific to WASI. Where would you have tried to look for documentation like this? Maybe that's a good place to send a PR?

drebbe-intrepid commented 1 week ago

Maybe the WASI specification should state something about it?

I think a few things need to happen here (just thoughts, open to ideas):

I've never had to care about libc and things like memset because it always just worked but I'm probably not going to be the first one to come across this behavior.

bjorn3 commented 1 week ago

rustc should error out like clang++ because it has an undefined reference to memset when no_std is used

Rustc doesn't because we pass --allow-undefined to the linker. See the comment on the code in question why we can't just remove this argument: https://github.com/rust-lang/rust/blob/74fd001cdae0321144a20133f2216ea8a97da476/compiler/rustc_target/src/spec/base/wasm.rs#L31-L40

rustc should probably provide some type of implementation for memset outside libc.

It does on targets where libc is expected to not be used like wasm32-unknown-unknown.

I believe emscripten does this.

No, emscripten has it's own libc that provides memset.

libc seems to be a defacto standard in majority of compilers, having a clear standard for rust documented would probably be ideal as WASM gains popularity.

Only for languages that natively interface with C. There are also AssemblyScript, the wasm port of C#, TeaVM (compiling Java to wasm), Hoot (compiling a Scheme to wasm) and more which do not use any libc.

bjorn3 commented 1 week ago

I've never had to care about libc and things like memset because it always just worked but I'm probably not going to be the first one to come across this behavior.

If you didn't have to care about it, that is almost certainly because you didn't use -nostdlib which is the C equivalent of #![no_std]. Without -nostdlib the linker will automatically add a dependency on libc, just like in rustc not using #![no_std] will automatically add a dependency on libstd and on targets that need it libc.

drebbe-intrepid commented 1 week ago

@bjorn3 awesome, thank you for the information, this is what I was looking for from the start. I'd like to possibly put this information somewhere but I don't believe rust has any official documentation on wasm stuff (or target specific behavior). I did find this: https://github.com/rustwasm/book

Maybe a PR against this repo would be ideal for now?