Compilation to WASM? - Githubissues

RReverser commented 5 years ago

Miri maintainer note: this is a fun project, but not something we currently intend to support officially. To keep maintenance manageable, Miri only supports running on platforms that rustc supports running on.

Compiling the whole Rustc to WASM is a pretty big undertaking for many reasons.

However, Miri doesn't need an actual codegen and many other parts of the whole Rustc, so I wonder how realistic it would be to compile it and the pieces it depends on to WASM instead? Are there any obvious blockers?

Mostly opening this to gauge interest and estimate complexity, as I believe there is an interest in running Rust directly in the browser on playground-like websites.

P.S. Despite what I said in the first sentence, this was actually done for Clang a while ago - https://tbfleming.github.io/cib/ - which includes LLVM compiled to WASM that, in turn, generates more WASM dynamically during runtime. In theory, it should be possible to do the same for Rust, especially since they share LLVM, but for now having just an interpreter could already be an interesting starting goal.

oli-obk commented 5 years ago

I think a first step would be to create a new codegen backend which doesn't actually do any codegen and just dumps the metadata. That way we should be able to build a rustc which doesn't depend on llvm or other C code.

bjorn3 commented 5 years ago

I think a first step would be to create a new codegen backend which doesn't actually do any codegen and just dumps the metadata.

I had created one in the past but it bitrotted and it was unused, so I removed it in https://github.com/rust-lang/rust/pull/58847. If you copy https://github.com/bjorn3/rustc_codegen_cranelift/blob/11d816c/src/lib.rs#L182-L190 into provide and provide_extern it should work. (Also change target_features_whitelist to contain the same as cg_llvm https://github.com/rust-lang/rust/blob/2d401fb4dc89eaef5b8f31330636094f9c26b4c4/src/librustc_codegen_llvm/llvm_util.rs#L249, otherwise stdsimd wont compile.)

Another thing necessary is replacing the dlopen for loading the codegen backend with a regular extern crate. Eg replace the match at https://github.com/rust-lang/rust/blob/08bfe16129b0621bc90184f8704523d4929695ef/src/librustc_interface/util.rs#L271 with _ => || Box::new(MetadataOnlyCodegenBackend) or however you call the backend.

bjorn3 commented 5 years ago

I want to do the same for https://github.com/bjorn3/rustc_codegen_cranelift/, but I want it to pass the rustc test suite first and cranelift doesnt support wasm output yet.

RalfJung commented 5 years ago

Intriguing. :) I should add one warning though: Miri isn't a fast interpreter. It's really slow. So I don't think it is actually a good environment to use to run code, I see it as more useful for debugging and testing.

But, don't let me stop you! I just felt I should give you a fair warning. And if ideas like this leak to people making Miri lightning fast while maintaining all the UB checking, I'll be even more happier. :D

RReverser commented 5 years ago

Miri isn't a fast interpreter. It's really slow. So I don't think it is actually a good environment to use to run code, I see it as more useful for debugging and testing.

That's understandable, but I think it's ought to be good enough for typical playground snippets :)

RReverser commented 5 years ago

I think a first step would be to create a new codegen backend which doesn't actually do any codegen and just dumps the metadata. That way we should be able to build a rustc which doesn't depend on llvm or other C code.

I guess that's one way, although I was wondering if Miri actually needs the main rustc crate or maybe it could be possible to depend only on some of the finer-grained rustc_* crates and avoid including codegen altogether?

bjorn3 commented 5 years ago

The codegen backend is necessary for rustc_driver to work. Using a dummy codegen backend (<100LOC mostly copyable from the code I mentioned in https://github.com/rust-lang/miri/issues/722#issuecomment-489019303) is a lot easier than duplicating all the things rustc_driver does (>>1000LOC).

RReverser commented 5 years ago

Fair enough.

bjorn3 commented 5 years ago

I am currently trying to compile rustc for wasm (https://github.com/bjorn3/rust/tree/compile_rustc_for_wasm), but I am hitting a compiler bug: https://github.com/rust-lang/rust/issues/60540.

RReverser commented 5 years ago

@bjorn3 I've rebased your branch onto master, updated deps and fixed cfg's from target_env to target_os - you can check it out at https://github.com/RReverser/rust/tree/compile_rustc_for_wasm.

Eventually it compiled successfully, but then ran into the same runtime validation issue with invalid code generated by Rust.

However, I recompiled in release mode and then it passed validation!

That got me thinking it should work now, but running the generated file with wasmtime or wasmer now seems to just hang. Some infinite loop somewhere perhaps?

RReverser commented 5 years ago

@bjorn3 Oh... maybe it's just been taking so long (especially the compilation part). I've tried wasmer with --backend singlepass instead now, and it has actually worked!

$ ./wasmer run target/wasm32-unknown-wasi/release/rustc_binary.wasm --backend singlepass
Usage: rustc [OPTIONS] INPUT

Options:
    -h, --help          Display this message
        --cfg SPEC      Configure the compilation environment
    -L [KIND=]PATH      Add a directory to the library search path. The
                        optional KIND can be one of dependency, crate, native,
                        framework or all (the default).
    -l [KIND=]NAME      Link the generated crate(s) to the specified native
                        library NAME. The optional KIND can be one of
                        static, dylib, or framework. If omitted, dylib is
                        assumed.
        --crate-type [bin|lib|rlib|dylib|cdylib|staticlib|proc-macro]
                        Comma separated list of types of crates
                        for the compiler to emit
        --crate-name NAME
                        Specify the name of the crate being built
        --edition 2015|2018
                        Specify which edition of the compiler to use when
                        compiling code.
        --emit [asm|llvm-bc|llvm-ir|obj|metadata|link|dep-info|mir]
                        Comma separated list of types of output for the
                        compiler to emit
        --print [crate-name|file-names|sysroot|cfg|target-list|target-cpus|target-features|relocation-models|code-models|tls-models|target-spec-json|native-static-libs]
                        Comma separated list of compiler information to print
                        on stdout
    -g                  Equivalent to -C debuginfo=2
    -O                  Equivalent to -C opt-level=2
    -o FILENAME         Write output to <filename>
        --out-dir DIR   Write output to compiler-chosen filename in <dir>
        --explain OPT   Provide a detailed explanation of an error message
        --test          Build a test harness
        --target TARGET Target triple for which the code is compiled
    -W, --warn OPT      Set lint warnings
    -A, --allow OPT     Set lint allowed
    -D, --deny OPT      Set lint denied
    -F, --forbid OPT    Set lint forbidden
        --cap-lints LEVEL
                        Set the most restrictive lint level. More restrictive
                        lints are capped at this level
    -C, --codegen OPT[=VALUE]
                        Set a codegen option
    -V, --version       Print version info and exit
    -v, --verbose       Use verbose output

Additional help:
    -C help             Print codegen options
    -W help             Print 'lint' options and default settings
    -Z help             Print unstable compiler options
    --help -v           Print the full set of options rustc accepts

bjorn3 commented 5 years ago

Oh... maybe it's just been taking so long (especially the compilation part).

Yes, it takes several minutes to compile it using wasmtime with cranelift as backend.

However, I recompiled in release mode and then it passed validation!

🎉 🎉 🎉

and it has actually worked!

I tried actually compiling something, but it errors with:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Custom { kind: Other, error: StringError("operation not supported on wasm yet") }', src/libcore/result.rs:999:5
note: Run with `RUST_BACKTRACE=1` environment variable to display a backtrace.

I am currently trying to figure out were it errors.

bjorn3 commented 5 years ago

Places needing patching:

librustc_interface/util.rs: spawn_thread_pool must not spawn a thread.
librustc/session/filesearch.rs: get_or_default_sysroot must not be called, as it needs std::env::current_exe. I am now passing an explicit --sysroot.
librustc/session/mod.rs: build_session_ local working_dir is created using std::env::current_dir.

Edit: pushed bjorn3/rust@15f980f60947eb63c9186db80ec5264efdad53fa (based on @RReverser's branch).

Now it errors at

thread 'main' panicked at 'unknown codegen backend llvm', src/librustc_interface/util.rs:277:18

Which is expected, as I had to remove the codegen backend dynamic loader. Will try to get https://github.com/bjorn3/rustc_codegen_cranelift to work with it.

RReverser commented 5 years ago

Places needing patching

FWIW previously (before even filing this issue) I tried compiling rustc with Emscripten instead, which should, in theory, reduce number of these places to patch, as it supports a bit more than WASI does. Haven't gotten too far though, because I tried to build completely unpatched rustc and there were few things that still didn't compile and probably needed similar fixes as in your branch.

Which is expected, as I had to remove the codegen backend dynamic loader. Will try to get bjorn3/rustc_codegen_cranelift to work with it.

I thought the plan was to build it without any codegen, just with miri? Or do you want to build an actual full rustc?

bjorn3 commented 5 years ago

I thought the plan was to build it without any codegen, just with miri? Or do you want to build an actual full rustc?

I want them both. :) I currently have rustc_codegen_cranelift hooked up, but rustc gives an error before rustc_codegen_cranelift can do actual codegen: can't find crate for `std`. Supporting miri will need that error to be fixed too.

RReverser commented 5 years ago

can't find crate for std

Yeah for that I think you'll need to do the proper build (via x.py build) to build all components. I haven't had much luck with that yet due to failures in other crates which probably also need to be patched similarly to rustc itself.

bjorn3 commented 5 years ago

Yeah for that I think you'll need to do the proper build (via x.py build) to build all components.

Seems like it doesn't even reach the rustc version check for the libraries. I added --sysroot $(rustc --print sysroot) and disabled the rustc version check, but it still gives the same error.

bjorn3 commented 5 years ago

Switching from wasmer to wasmtime fixed it. It even got to the beginning of codegen.

Edit: filled https://github.com/wasmerio/wasmer/issues/434.

bjorn3 commented 5 years ago

I am currently working on making miri compile for wasi, which this issue was actually about.

bjorn3 commented 5 years ago

It seems to trap while calling the ecx.run. :(

error while processing main module ../../target/wasm32-unknown-wasi/release/rustc_binary.wasm: Instantiation error: Trap occurred while invoking start function: wasm trap at 0x2a881eb82

I pushed the wip stuff to my branch.

RReverser commented 5 years ago

@bjorn3 Left a comment on your MIRI commit on your branch.

RReverser commented 5 years ago

@bjorn3 But also, I'm not sure why rustc is now depending on miri... shouldn't it be the other way around? (like in non-WASI version)

bjorn3 commented 5 years ago

I did that to be able to prevent having to recompile every rustc crate, which is slow and to prevent having to copy all files in the dir layout rustc wants a sysroot to be.

RReverser commented 5 years ago

I'm not sure I understand what you're saying... neither should be affected by which crate you compile as an entry point.

I've changed my local copy of Rust & MIRI to do just that, and got miri.wasm successfully, but yeah, also hitting some trap.

bjorn3 commented 5 years ago

I meant that I had already compiled all crates in rust/target. When switching to miri as crate root, I would have to recompile all crates into miri/target. By keeping rustc-binary as crate root, I could reuse rust/target as target dir.

RReverser commented 5 years ago

When switching to miri as crate root, I would have to recompile all crates into miri/target

Not when you use miri in src/tools/miri path - then it reuses same workspace.

bjorn3 commented 5 years ago

Yes, but I didn't think about that possibility. I just used the miri I had already cloned outside my rust clone.

RReverser commented 5 years ago

Yeah, in conjunction with your comment on that commit it makes sense now. Shouldn't be hard to switch.

bjorn3 commented 5 years ago

Things being patched (how patched / how to prevent in the future):

[x] measureme (rust-lang/measureme#43)
[ ] jobserver (alexcrichton/jobserver-rs#13)
[ ] ~~backtrace (remove dep or stub out on wasm, alexcrichton/backtrace-rs#178)~~
[x] flate2 (enable rust_backend feature and disable default features, https://github.com/alexcrichton/flate2-rs/pull/194)
[ ] std::env::current_dir (use . instead)
[ ] std::path::Path::canonicalize (just don't)
[ ] rustc_interface::util::get_resident (remove)
[ ] memmap (use Vec<u8> on WASI https://github.com/danburkert/memmap-rs/issues/88#issuecomment-492320205)
[ ] threads (run in one thread)
[ ] several other things

RReverser commented 5 years ago

What's wrong with flate2? It worked well in WASM for me (in other projects). See https://github.com/alexcrichton/flate2-rs/issues/161.

bjorn3 commented 5 years ago

It didn't compile. I believe I will have to enable the rust_backend feature and disable the default features.

RReverser commented 5 years ago

Btw, when using Cranelift backend instead, the trap has extra detail:

thread 'main' panicked at 'wasm trap occured: memory out-of-bounds access', src/bin/wasmer.rs:359:51

Not that it's very helpful without a stacktrace.

bjorn3 commented 5 years ago

That makes me suspect https://github.com/rust-lang/rust/blob/efa3c27f0ff21960b9309f8036dbf3e7416b9e52/src/librustc_mir/interpret/memory.rs#L743-L768. It is the only block of unsafe code in rustc_mir::interpret.

bjorn3 commented 5 years ago

I tried compiling wasi rustc with debuginfo to get a backtrace with lldb, but wasmtime panics while processing the debuginfo (https://github.com/CraneStation/wasmtime/issues/144)

RReverser commented 5 years ago

@bjorn3 Wow, this seems to be a decent stress test that uncovers all sorts of bugs in WASM tooling. Thanks for working on it!

bjorn3 commented 5 years ago

Yeah, I expected rustc itself to be the hardest part of compiling it for WASM, not the tools. :)

bjorn3 commented 5 years ago

https://github.com/rust-lang/rust/pull/60831 got merged recently, haven't been able to confirm that the miscompilation in debug mode is fixed though.

RReverser commented 5 years ago

measureme fix has been merged too, worth updating the list above?

bjorn3 commented 5 years ago

Got some logs for the miri SIGSEGV:

``` [...] [2019-05-15T16:50:10Z INFO rustc_mir::interpret::step] (_0.2: std::marker::PhantomData<&T>) = move _21 [2019-05-15T16:50:10Z INFO rustc_mir::interpret::step] StorageDead(_21) [2019-05-15T16:50:10Z INFO rustc_mir::interpret::step] StorageDead(_20) [2019-05-15T16:50:10Z INFO rustc_mir::interpret::step] StorageDead(_19) [2019-05-15T16:50:10Z INFO rustc_mir::interpret::step] StorageDead(_8) [2019-05-15T16:50:10Z INFO rustc_mir::interpret::step] StorageDead(_2) [2019-05-15T16:50:10Z INFO rustc_mir::interpret::step] return [2019-05-15T16:50:10Z INFO rustc_mir::interpret::eval_context] LEAVING(7) core::slice::::iter [2019-05-15T16:50:10Z INFO rustc_mir::interpret::eval_context] CONTINUING(6) std::vec::Vec::::extend_from_slice [2019-05-15T16:50:10Z INFO rustc_mir::interpret::step] // bb1 [2019-05-15T16:50:10Z INFO rustc_mir::interpret::step] StorageDead(_5) [2019-05-15T16:50:10Z INFO rustc_mir::interpret::step] _0 = const std::vec::SpecExtend::spec_extend(move _3, move _4) -> bb2 Process 22977 stopped * thread #1: tid = 22977, 0x00007ffdf4f28dd2 JIT(0x7ffe097f4010), name = 'wasmtime', stop reason = signal SIGSEGV: invalid address (fault address: 0x0) frame #0: 0x00007ffdf4f28dd2 JIT(0x7ffe097f4010) -> 0x7ffdf4f28dd2: movslq (%rcx,%rax,4), %rax 0x7ffdf4f28dd6: addq %rax, %rcx 0x7ffdf4f28dd9: jmpq *%rcx 0x7ffdf4f28ddc: movl 0x4fc(%rsp), %eax (lldb) bt * thread #1: tid = 22977, 0x00007ffdf4f28dd2 JIT(0x7ffe097f4010), name = 'wasmtime', stop reason = signal SIGSEGV: invalid address (fault address: 0x0) * frame #0: 0x00007ffdf4f28dd2 JIT(0x7ffe097f4010) frame #1: 0x00007ffdf4f17ed0 JIT(0x7ffe097f4010) frame #2: 0x00007ffdf6c23e05 JIT(0x7ffe097f4010) frame #3: 0x00007ffdf6c21d8b JIT(0x7ffe097f4010) frame #4: 0x00007ffdf4e6880d JIT(0x7ffe097f4010) frame #5: 0x00007ffdf6b6842a JIT(0x7ffe097f4010) frame #6: 0x00007ffdf6acbb0a JIT(0x7ffe097f4010) frame #7: 0x00007ffdf6c3fdaa JIT(0x7ffe097f4010) frame #8: 0x00007ffdf4708a2e JIT(0x7ffe097f4010) frame #9: 0x00007ffdf6bf9418 JIT(0x7ffe097f4010) frame #10: 0x00007ffdf6bf615d JIT(0x7ffe097f4010) frame #11: 0x00007ffdf592b819 JIT(0x7ffe097f4010) frame #12: 0x00007ffdf67368cb JIT(0x7ffe097f4010) frame #13: 0x00007ffdf44e2bbd JIT(0x7ffe097f4010) frame #14: 0x00007ffdf5351dff JIT(0x7ffe097f4010) frame #15: 0x00007ffdf459e5c7 JIT(0x7ffe097f4010) frame #16: 0x00007ffdf5344b6b JIT(0x7ffe097f4010) frame #17: 0x00007ffdf42f33a3 JIT(0x7ffe097f4010) frame #18: 0x00007ffdee912992 JIT(0x7ffe097f4010) frame #19: 0x00007ffdeebdcc83 JIT(0x7ffe097f4010) frame #20: 0x00007ffdedbf895c JIT(0x7ffe097f4010) frame #21: 0x00007ffdedbfa6d8 JIT(0x7ffe097f4010) frame #22: 0x00007ffdeef72e1a JIT(0x7ffe097f4010) frame #23: 0x00007ffdedbfa4bc JIT(0x7ffe097f4010) frame #24: 0x00007ffdedc039fe JIT(0x7ffe097f4010) frame #25: 0x00007ffdedf2fd72 JIT(0x7ffe097f4010) frame #26: 0x00007ffdedf64859 JIT(0x7ffe097f4010) frame #27: 0x00007ffdedf1d207 JIT(0x7ffe097f4010) frame #28: 0x00007ffdee0e194e JIT(0x7ffe097f4010) frame #29: 0x00007ffdedbf3f23 JIT(0x7ffe097f4010) frame #30: 0x00007ffdedbd620d JIT(0x7ffe097f4010) frame #31: 0x00007ffdf7f9ffdc JIT(0x7ffe097f4010) at lib.rs:85 frame #32: 0x00007ffdedbd38ce JIT(0x7ffe097f4010) frame #33: 0x00007ffdedc07f92 JIT(0x7ffe097f4010) frame #34: 0x00007ffdedbf6111 JIT(0x7ffe097f4010) frame #35: 0x00007ffdf7f8e8ff JIT(0x7ffe097f4010) at rt.rs:49 frame #36: 0x00007ffdf7f98b38 JIT(0x7ffe097f4010) at rt.rs:49 frame #37: 0x00007ffdf7f9ffdc JIT(0x7ffe097f4010) at lib.rs:85 frame #38: 0x00007ffdf7f9b954 JIT(0x7ffe097f4010) at panicking.rs:272 frame #39: 0x00007ffdedc084e5 JIT(0x7ffe097f4010) frame #40: 0x00007ffdedbc486b JIT(0x7ffe097f4010) frame #41: 0x0000555555676071 wasmtime`wasmtime_call + 433 frame #42: 0x0000555555670823 wasmtime`wasmtime_runtime::instance::Instance::invoke_function::h1366e5c600361498 + 147 frame #43: 0x0000555555673274 wasmtime`wasmtime_runtime::instance::InstanceHandle::new::hf352a471df9e241a + 8644 frame #44: 0x000055555560d1c8 wasmtime`wasmtime_jit::instantiate::instantiate::hbbe129f222e6a659 + 712 ```

It is not really helpful, as https://github.com/CraneStation/wasmtime/issues/144 forces me to compile it without debuginfo.

RReverser commented 5 years ago

@bjorn3 What about wasmer? Does it use debug information at least for the stacktrace?

bjorn3 commented 5 years ago

Wasmer seems to have some problems with WASI compliance: https://github.com/wasmerio/wasmer/issues/434 it used to not be able to read dirs and now it can't read /home/bjorn/.cache/miri/HOST/lib/rustlib/x86_64-unknown-linux-gnu/lib/libstd.so.

RReverser commented 5 years ago

@bjorn3 You might want to pass a more precise nested path in --dir - at least that did trick for me before in a similar issue.

bjorn3 commented 5 years ago

Passing --dir $MIRI_SYSROOT/lib/rustlib/x86_64-unknown-linux-gnu/lib as extra argument or replacing the --dir $MIRI_SYSROOT doesn't work.

RReverser commented 5 years ago

I see, interesting. Btw, do you think libstd.so is anyway going to help here? I suppose for execution in Wasm we'd need to compile libstd to Wasm as well...

bjorn3 commented 5 years ago

I suppose for execution in Wasm we'd need to compile libstd to Wasm as well...

Actually not, we only need to read the metadata rustc stores in it, so by telling rustc to "compile" the executable to be interpreted with the target libstd was compiled for should just work:tm:.

RReverser commented 5 years ago

I see. Does that mean we'll still need to ship the whole libstd.so to browsers and such or only the metadata parts?

bjorn3 commented 5 years ago

It actually first tries to load libstd.rlib. This is just an ar archive, so it is possible to remove all object files from it and just keep the rust.metadata.bin file it contains in the archive.

bjorn3 commented 5 years ago

At least the combination of https://github.com/rust-lang/rust/commit/709120b32146e74c19ecb53fd58b2b108fa9096a and the wasmer fix in https://github.com/wasmerio/wasmer/pull/446 seems to fix the miri SIGSEGV.

bjorn3 commented 5 years ago

Using wasmtime with that commit still gives SIGSEGV.

MarkMcCaskey commented 5 years ago

@bjorn3 What about wasmer? Does it use debug information at least for the stacktrace?

To get debug information from wasmer run it like:

cargo run --release --features debug -- ...

I'll add it to my todo list to document this somewhere (and also improve the signal:noise ratio while I'm at it)

Wasmer's WASI implementation is being prioritized based on need right now (until it becomes more stable and we define some spec tests), so please don't hesitate to file issues on Wasmer or ping us if there's anything we can do to help!

I'm super excited about your work here -- thanks for doing it!

rust-lang / miri

Compilation to WASM? #722