Open RReverser opened 5 years ago
I think a first step would be to create a new codegen backend which doesn't actually do any codegen and just dumps the metadata. That way we should be able to build a rustc which doesn't depend on llvm or other C code.
I think a first step would be to create a new codegen backend which doesn't actually do any codegen and just dumps the metadata.
I had created one in the past but it bitrotted and it was unused, so I removed it in https://github.com/rust-lang/rust/pull/58847. If you copy https://github.com/bjorn3/rustc_codegen_cranelift/blob/11d816c/src/lib.rs#L182-L190 into provide
and provide_extern
it should work. (Also change target_features_whitelist
to contain the same as cg_llvm
https://github.com/rust-lang/rust/blob/2d401fb4dc89eaef5b8f31330636094f9c26b4c4/src/librustc_codegen_llvm/llvm_util.rs#L249, otherwise stdsimd
wont compile.)
Another thing necessary is replacing the dlopen
for loading the codegen backend with a regular extern crate
. Eg replace the match at https://github.com/rust-lang/rust/blob/08bfe16129b0621bc90184f8704523d4929695ef/src/librustc_interface/util.rs#L271 with _ => || Box::new(MetadataOnlyCodegenBackend)
or however you call the backend.
I want to do the same for https://github.com/bjorn3/rustc_codegen_cranelift/, but I want it to pass the rustc test suite first and cranelift
doesnt support wasm output yet.
Intriguing. :) I should add one warning though: Miri isn't a fast interpreter. It's really slow. So I don't think it is actually a good environment to use to run code, I see it as more useful for debugging and testing.
But, don't let me stop you! I just felt I should give you a fair warning. And if ideas like this leak to people making Miri lightning fast while maintaining all the UB checking, I'll be even more happier. :D
Miri isn't a fast interpreter. It's really slow. So I don't think it is actually a good environment to use to run code, I see it as more useful for debugging and testing.
That's understandable, but I think it's ought to be good enough for typical playground snippets :)
I think a first step would be to create a new codegen backend which doesn't actually do any codegen and just dumps the metadata. That way we should be able to build a rustc which doesn't depend on llvm or other C code.
I guess that's one way, although I was wondering if Miri actually needs the main rustc
crate or maybe it could be possible to depend only on some of the finer-grained rustc_*
crates and avoid including codegen altogether?
The codegen backend is necessary for rustc_driver
to work. Using a dummy codegen backend (<100LOC mostly copyable from the code I mentioned in https://github.com/rust-lang/miri/issues/722#issuecomment-489019303) is a lot easier than duplicating all the things rustc_driver
does (>>1000LOC).
Fair enough.
I am currently trying to compile rustc for wasm (https://github.com/bjorn3/rust/tree/compile_rustc_for_wasm), but I am hitting a compiler bug: https://github.com/rust-lang/rust/issues/60540.
@bjorn3 I've rebased your branch onto master, updated deps and fixed cfg's from target_env
to target_os
- you can check it out at https://github.com/RReverser/rust/tree/compile_rustc_for_wasm.
Eventually it compiled successfully, but then ran into the same runtime validation issue with invalid code generated by Rust.
However, I recompiled in release mode and then it passed validation!
That got me thinking it should work now, but running the generated file with wasmtime
or wasmer
now seems to just hang. Some infinite loop somewhere perhaps?
@bjorn3 Oh... maybe it's just been taking so long (especially the compilation part). I've tried wasmer
with --backend singlepass
instead now, and it has actually worked!
$ ./wasmer run target/wasm32-unknown-wasi/release/rustc_binary.wasm --backend singlepass
Usage: rustc [OPTIONS] INPUT
Options:
-h, --help Display this message
--cfg SPEC Configure the compilation environment
-L [KIND=]PATH Add a directory to the library search path. The
optional KIND can be one of dependency, crate, native,
framework or all (the default).
-l [KIND=]NAME Link the generated crate(s) to the specified native
library NAME. The optional KIND can be one of
static, dylib, or framework. If omitted, dylib is
assumed.
--crate-type [bin|lib|rlib|dylib|cdylib|staticlib|proc-macro]
Comma separated list of types of crates
for the compiler to emit
--crate-name NAME
Specify the name of the crate being built
--edition 2015|2018
Specify which edition of the compiler to use when
compiling code.
--emit [asm|llvm-bc|llvm-ir|obj|metadata|link|dep-info|mir]
Comma separated list of types of output for the
compiler to emit
--print [crate-name|file-names|sysroot|cfg|target-list|target-cpus|target-features|relocation-models|code-models|tls-models|target-spec-json|native-static-libs]
Comma separated list of compiler information to print
on stdout
-g Equivalent to -C debuginfo=2
-O Equivalent to -C opt-level=2
-o FILENAME Write output to <filename>
--out-dir DIR Write output to compiler-chosen filename in <dir>
--explain OPT Provide a detailed explanation of an error message
--test Build a test harness
--target TARGET Target triple for which the code is compiled
-W, --warn OPT Set lint warnings
-A, --allow OPT Set lint allowed
-D, --deny OPT Set lint denied
-F, --forbid OPT Set lint forbidden
--cap-lints LEVEL
Set the most restrictive lint level. More restrictive
lints are capped at this level
-C, --codegen OPT[=VALUE]
Set a codegen option
-V, --version Print version info and exit
-v, --verbose Use verbose output
Additional help:
-C help Print codegen options
-W help Print 'lint' options and default settings
-Z help Print unstable compiler options
--help -v Print the full set of options rustc accepts
Oh... maybe it's just been taking so long (especially the compilation part).
Yes, it takes several minutes to compile it using wasmtime
with cranelift
as backend.
However, I recompiled in release mode and then it passed validation!
🎉 🎉 🎉
and it has actually worked!
I tried actually compiling something, but it errors with:
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Custom { kind: Other, error: StringError("operation not supported on wasm yet") }', src/libcore/result.rs:999:5
note: Run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
I am currently trying to figure out were it errors.
Places needing patching:
spawn_thread_pool
must not spawn a thread.get_or_default_sysroot
must not be called, as it needs std::env::current_exe
. I am now passing an explicit --sysroot
.build_session_
local working_dir
is created using std::env::current_dir
.Edit: pushed bjorn3/rust@15f980f60947eb63c9186db80ec5264efdad53fa (based on @RReverser's branch).
Now it errors at
thread 'main' panicked at 'unknown codegen backend llvm', src/librustc_interface/util.rs:277:18
Which is expected, as I had to remove the codegen backend dynamic loader. Will try to get https://github.com/bjorn3/rustc_codegen_cranelift to work with it.
Places needing patching
FWIW previously (before even filing this issue) I tried compiling rustc with Emscripten instead, which should, in theory, reduce number of these places to patch, as it supports a bit more than WASI does. Haven't gotten too far though, because I tried to build completely unpatched rustc and there were few things that still didn't compile and probably needed similar fixes as in your branch.
Which is expected, as I had to remove the codegen backend dynamic loader. Will try to get bjorn3/rustc_codegen_cranelift to work with it.
I thought the plan was to build it without any codegen, just with miri? Or do you want to build an actual full rustc?
I thought the plan was to build it without any codegen, just with miri? Or do you want to build an actual full rustc?
I want them both. :) I currently have rustc_codegen_cranelift
hooked up, but rustc gives an error before rustc_codegen_cranelift
can do actual codegen: can't find crate for `std`
. Supporting miri will need that error to be fixed too.
can't find crate for
std
Yeah for that I think you'll need to do the proper build (via x.py build
) to build all components. I haven't had much luck with that yet due to failures in other crates which probably also need to be patched similarly to rustc itself.
Yeah for that I think you'll need to do the proper build (via x.py build) to build all components.
Seems like it doesn't even reach the rustc version check for the libraries. I added --sysroot $(rustc --print sysroot)
and disabled the rustc version check, but it still gives the same error.
Switching from wasmer to wasmtime fixed it. It even got to the beginning of codegen.
Edit: filled https://github.com/wasmerio/wasmer/issues/434.
I am currently working on making miri compile for wasi, which this issue was actually about.
It seems to trap while calling the ecx.run
. :(
error while processing main module ../../target/wasm32-unknown-wasi/release/rustc_binary.wasm: Instantiation error: Trap occurred while invoking start function: wasm trap at 0x2a881eb82
I pushed the wip stuff to my branch.
@bjorn3 Left a comment on your MIRI commit on your branch.
@bjorn3 But also, I'm not sure why rustc is now depending on miri... shouldn't it be the other way around? (like in non-WASI version)
I did that to be able to prevent having to recompile every rustc crate, which is slow and to prevent having to copy all files in the dir layout rustc wants a sysroot to be.
I'm not sure I understand what you're saying... neither should be affected by which crate you compile as an entry point.
I've changed my local copy of Rust & MIRI to do just that, and got miri.wasm
successfully, but yeah, also hitting some trap.
I meant that I had already compiled all crates in rust/target
. When switching to miri as crate root, I would have to recompile all crates into miri/target
. By keeping rustc-binary as crate root, I could reuse rust/target
as target dir.
When switching to miri as crate root, I would have to recompile all crates into miri/target
Not when you use miri
in src/tools/miri
path - then it reuses same workspace.
Yes, but I didn't think about that possibility. I just used the miri
I had already cloned outside my rust
clone.
Yeah, in conjunction with your comment on that commit it makes sense now. Shouldn't be hard to switch.
Things being patched (how patched / how to prevent in the future):
measureme
(rust-lang/measureme#43)jobserver
(alexcrichton/jobserver-rs#13)backtrace
(remove dep or stub out on wasm, alexcrichton/backtrace-rs#178)flate2
(enable rust_backend
feature and disable default features, https://github.com/alexcrichton/flate2-rs/pull/194)std::env::current_dir
(use .
instead)std::path::Path::canonicalize
(just don't)rustc_interface::util::get_resident
(remove)memmap
(use Vec<u8>
on WASI https://github.com/danburkert/memmap-rs/issues/88#issuecomment-492320205)What's wrong with flate2? It worked well in WASM for me (in other projects). See https://github.com/alexcrichton/flate2-rs/issues/161.
It didn't compile. I believe I will have to enable the rust_backend
feature and disable the default features.
Btw, when using Cranelift backend instead, the trap has extra detail:
thread 'main' panicked at 'wasm trap occured: memory out-of-bounds access', src/bin/wasmer.rs:359:51
Not that it's very helpful without a stacktrace.
That makes me suspect https://github.com/rust-lang/rust/blob/efa3c27f0ff21960b9309f8036dbf3e7416b9e52/src/librustc_mir/interpret/memory.rs#L743-L768. It is the only block of unsafe code in rustc_mir::interpret
.
I tried compiling wasi rustc with debuginfo to get a backtrace with lldb, but wasmtime panics while processing the debuginfo (https://github.com/CraneStation/wasmtime/issues/144)
@bjorn3 Wow, this seems to be a decent stress test that uncovers all sorts of bugs in WASM tooling. Thanks for working on it!
Yeah, I expected rustc itself to be the hardest part of compiling it for WASM, not the tools. :)
https://github.com/rust-lang/rust/pull/60831 got merged recently, haven't been able to confirm that the miscompilation in debug mode is fixed though.
measureme
fix has been merged too, worth updating the list above?
Got some logs for the miri SIGSEGV
:
It is not really helpful, as https://github.com/CraneStation/wasmtime/issues/144 forces me to compile it without debuginfo.
@bjorn3 What about wasmer? Does it use debug information at least for the stacktrace?
Wasmer seems to have some problems with WASI compliance: https://github.com/wasmerio/wasmer/issues/434 it used to not be able to read dirs and now it can't read /home/bjorn/.cache/miri/HOST/lib/rustlib/x86_64-unknown-linux-gnu/lib/libstd.so
.
@bjorn3 You might want to pass a more precise nested path in --dir
- at least that did trick for me before in a similar issue.
Passing --dir $MIRI_SYSROOT/lib/rustlib/x86_64-unknown-linux-gnu/lib
as extra argument or replacing the --dir $MIRI_SYSROOT
doesn't work.
I see, interesting. Btw, do you think libstd.so
is anyway going to help here? I suppose for execution in Wasm we'd need to compile libstd to Wasm as well...
I suppose for execution in Wasm we'd need to compile libstd to Wasm as well...
Actually not, we only need to read the metadata rustc stores in it, so by telling rustc to "compile" the executable to be interpreted with the target libstd was compiled for should just work:tm:.
I see. Does that mean we'll still need to ship the whole libstd.so
to browsers and such or only the metadata parts?
It actually first tries to load libstd.rlib
. This is just an ar archive, so it is possible to remove all object files from it and just keep the rust.metadata.bin
file it contains in the archive.
At least the combination of https://github.com/rust-lang/rust/commit/709120b32146e74c19ecb53fd58b2b108fa9096a and the wasmer fix in https://github.com/wasmerio/wasmer/pull/446 seems to fix the miri SIGSEGV.
Using wasmtime with that commit still gives SIGSEGV.
@bjorn3 What about wasmer? Does it use debug information at least for the stacktrace?
To get debug information from wasmer run it like:
cargo run --release --features debug -- ...
I'll add it to my todo list to document this somewhere (and also improve the signal:noise ratio while I'm at it)
Wasmer's WASI implementation is being prioritized based on need right now (until it becomes more stable and we define some spec tests), so please don't hesitate to file issues on Wasmer or ping us if there's anything we can do to help!
I'm super excited about your work here -- thanks for doing it!
Miri maintainer note: this is a fun project, but not something we currently intend to support officially. To keep maintenance manageable, Miri only supports running on platforms that rustc supports running on.
Compiling the whole Rustc to WASM is a pretty big undertaking for many reasons.
However, Miri doesn't need an actual codegen and many other parts of the whole Rustc, so I wonder how realistic it would be to compile it and the pieces it depends on to WASM instead? Are there any obvious blockers?
Mostly opening this to gauge interest and estimate complexity, as I believe there is an interest in running Rust directly in the browser on playground-like websites.
P.S. Despite what I said in the first sentence, this was actually done for Clang a while ago - https://tbfleming.github.io/cib/ - which includes LLVM compiled to WASM that, in turn, generates more WASM dynamically during runtime. In theory, it should be possible to do the same for Rust, especially since they share LLVM, but for now having just an interpreter could already be an interesting starting goal.