rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
97.78k stars 12.65k forks source link

Rust `1.67.0` stopped initializing the WASI environment for exported functions on `target="wasm32-wasi"` #107635

Open bkolobara opened 1 year ago

bkolobara commented 1 year ago

The code:

fn main() {}

#[export_name="test"]
pub fn test() {}

would produce the following Wasm assembly on Rust 1.66.1 compiled with cargo build --target="wasm32-wasi":

;; ... truncated ...
(func $test.command_export (type 0)
    call $__wasm_call_ctors ;; <- WASI initialized here
;; ... truncated ...
(export "test" (func $test.command_export))
(export "main" (func $main.command_export))
;; ... truncated ...

On 1.67.0 it produces this one:

;; ... truncated ...
(func $test (type 0)
    return)
;; ... truncated ...
(export "test" (func $test))
(export "_start" (func $_start))
;; ... truncated ...

Notice that the call to the $__wasm_call_ctors function is missing in the exported test. This means that when entering the Wasm instance through the exported function, it will not have a correctly set up WASI environment and calls to open files or look up environment variables will fail. The _start function in both versions calls $__wasm_call_ctors and works correctly.

I'm not sure what commit broke this. The only PR that is related to this seems to be this one https://github.com/rust-lang/rust/pull/105405/.

EDIT: Or more likely, this one https://github.com/rust-lang/rust/pull/105395

workingjubilee commented 1 year ago

cc @sunfishcode re: apparent wasm32-wasi functionality regression

sunfishcode commented 1 year ago

I recognize that it worked in wasm32-wasi in earlier versions, however it was by accident. Rust isn't meant to support having programs with a main function which can be entered without calling the main function.

Are you able to switch the crate to a cdylib, by eg. adding this to Cargo.toml, and removing the main function?

[lib]
crate-type = ["cdylib"]

This configuration is not entirely stable yet, but may work as a temporary workaround.

bkolobara commented 1 year ago

The workaround works, but it has some significant drawbacks for us. We have been relying on this functionality for the last 2 years. Just to give a bit more context about our use case. We expose an API to wasm modules so that they can create new instances from the same module, but use a different entry point. This requires us to allow entering the module without calling main. And this is one of our core features.

The update to 1.67.0 is a breaking change for all our users. Changing to the [lib] approach without main will also require updating all documentation and tutorials. It also significantly reduces the developer experience around our tooling. For example, it would not be possible to just use cargo run with a set runner anymore to start up applications. The setup would need to be much more complicated.

We have a fairly big community around our products. Just our discord consists of 800+ members and Rust is our primary language. So this change has a significant impact on us. The best outcome for us would be to allow the previous behaviour, both main and explicitly exported functions being allowed to be used as entry points into the instance.

sunfishcode commented 1 year ago

Directly reverting the PR in question would cause regressions for other users, including users who specifically need the change to not run the initializers on arbitrary exports, and who requested this change. I'm looking into other ways to fix this.

@alexcrichton had the idea that we could remove all use of __attribute__((constructor)) from wasi-libc and convert the last few eager initializations to lazy initialization. That will require some care to make sure we can initialize preopens at the right time, but I'll look into it.

workingjubilee commented 1 year ago

It's worth noting that while WASI is definitely expected to land in some form, it's not a finished, stable part of WebAssembly, because it's big enough as an idea to have its own WASI subproposals.

If I understand correctly, WASI programs have to initialize their own environment, rather than the system function definitions being included via some dynamic linker equivalent transcluding libc-or-whatnot? And I would guess the same for other things, instead of simply having certain things happen implicitly due to the operating system's program loader? Interesting. On most systems, somehow bypassing those two components of the system could have equally annoying consequences, it just doesn't happen because it's (mostly) not possible.

apiraino commented 1 year ago

WG-prioritization assigning priority (Zulip discussion).

@rustbot label -I-prioritize +P-medium

sunfishcode commented 1 year ago

I've now submitted https://github.com/rust-lang/rust/pull/107866 which changes Rust's use of the global environ variable, which requires wasi-libc to eagerly initialize environment variables to use the wasi-libc __wasilibc_get_environ function which allows it to lazily initialize them. This should have the side effect of fixing at least some instances of the regression here, as eager initialization means that initialization is never done if _start isn't called, while lazy initialization allows it to be done on demand when a function calls std::env::vars() or similar.

hbina commented 1 year ago

I recognize that it worked in wasm32-wasi in earlier versions, however it was by accident. Rust isn't meant to support having programs with a main function which can be entered without calling the main function.

Are you able to switch the crate to a cdylib, by eg. adding this to Cargo.toml, and removing the main function?

[lib]
crate-type = ["cdylib"]

This configuration is not entirely stable yet, but may work as a temporary workaround.

This workaround doesn't seem to work for me. I compiled, linked and initialize a wasm blob here https://github.com/lapce/lapce/blob/master/lapce-proxy/src/plugin/wasi.rs#L360-L520

And it is trying to compile a wasm of this repo https://github.com/hbina/lapce-prettier

And with some grep-foo, I can see that it is trying to initialize the environment variable eagerly?

Compiling as a main.rs,

hbina@akarin ~/g/lapce-prettier (master)> just diagnose
cargo +nightly build --target="wasm32-wasi"
   Compiling lapce_prettier v0.0.0 (/home/hbina/git/lapce-prettier)
    Finished dev [unoptimized + debuginfo] target(s) in 0.75s
wasm-tools print target/wasm32-wasi/debug/lapce_prettier.wasm > ./lapce_prettier.txt
rg -w "export" lapce_prettier.txt
1262454:  (export "memory" (memory 0))
1262455:  (export "_start" (func $_start))
1262456:  (export "__main_void" (func $__main_void))
1262457:  (export "handle_rpc" (func $handle_rpc))

Where _start looks like this

  (func $_start (;9;) (type 2)
    (local i32)
    block ;; label = @1
      block ;; label = @2
        i32.const 0
        i32.load offset=1353400
        br_if 0 (;@2;)
        i32.const 0
        i32.const 1
        i32.store offset=1353400
        call $__wasm_call_ctors
        call $__main_void
        local.set 0
        call $__wasm_call_dtors
        local.get 0
        br_if 1 (;@1;)
        return
      end
      unreachable
      unreachable
    end
    local.get 0
    call $__wasi_proc_exit
    unreachable
  )

And handle_rpc here goes into the actual function immediately without initializing anything.

Compiling as lib.rs with the suggestion above,

hbina@akarin ~/g/lapce-prettier (master)> just diagnose
cargo +nightly build --target="wasm32-wasi"
    Finished dev [unoptimized + debuginfo] target(s) in 0.42s
wasm-tools print target/wasm32-wasi/debug/lapce_prettier.wasm > ./lapce_prettier.txt
rg -w "export" lapce_prettier.txt
1260914:  (export "memory" (memory 0))
1260915:  (export "handle_rpc" (func $handle_rpc.command_export))

Where handle_rpc looks like

  (func $handle_rpc.command_export (;8781;) (type 10)
    call $handle_rpc
    call $__wasm_call_dtors
  )

It doesn't call ctors because....the dtor is just a dummy?

  (func $dummy (;8593;) (type 10))
  (func $__wasm_call_dtors (;8594;) (type 10)
    call $dummy
    call $dummy
  )

but it doesn't seem to work still.

Edit: Is there a tool to do this for me? Something like wasm-tools print --export $name to print the body of the exported function called $name. Is there a jq equivalent for WASM yet?

workingjubilee commented 1 year ago

@hbina Can you try again with the latest nightly? It should be carrying the partial fix mentioned, which should cover your case.

hbina commented 1 year ago

Hi @workingjubilee I have updated my previous post to add some clarity and use +nightly and it still doesn't work :( Whatever happened to _initialize?

sunfishcode commented 1 year ago

The latest nightly now includes https://github.com/rust-lang/rust/pull/107866 and lazily initializes environment variables, at least in simple examples, which I hope will fix the immediate bug with environment variables.

For the more general fix, I've now submitted https://github.com/rust-lang/rust/pull/108097 which switches Rust's cdylib on Wasm to always contain an _initialize function. This is debatable, and my comment here discusses the options.