Tracking issue for supporting asm.js and WebAssembly without Fastcomp

tlively commented 7 years ago

Breaking Rust's dependency on Fastcomp will allow upgrades to Rust's LLVM to be much smoother because they won't depend on Fastcomp being updated. Smoother upgrades will allow LLVM to be kept up to date more easily (https://github.com/rust-lang/rust/issues/42389), which will be beneficial across the board but especially for WebAssembly as its LLVM backend matures. It is necessary that the asmjs and wasm targets emit object files instead of LLVM bitcode so that bitcode version mismatches between Rust and Emscripten won't be a problem. Work that needs to be done to break the dependency on Fastcomp includes:

[ ] finish wasm2asm and integrate it into Emscripten as a backend for asm.js (https://github.com/WebAssembly/binaryen/issues/1141)
[ ] make Emscripten take WebAssembly object files as input (blocked on LLD supporting WebAssembly https://groups.google.com/forum/#!topic/llvm-dev/BwFL_ulYX4E, https://reviews.llvm.org/D34851)
[ ] make wasm32-unknown-emscripten emit WebAssembly object files
[ ] update asmjs-unknown-emscripten to emit WebAssembly object files and use the future wasm2asm backend in Emscripten
[ ] remove PNaCl/NaCl support, since it depends on Fastcomp (#42420)

est31 commented 7 years ago

@tlively I guess #42420 would also be part of this? If so, can it be added to the list?

tlively commented 7 years ago

@est31 I'm not sure I understand why #42420 is part of this. Does NaCl/PNaCl support depend on Fastcomp as well?

est31 commented 7 years ago

@tlively I'm no LLVM expert, but when I look into rust's fork of LLVM, then lib/Target/JSBackend/NaCl/ is the place I can find the NaCl backend while upstream LLVM seems to lack the JSBackend subdir entirely. I also can't find any meaningful references to NaCl in upstream LLVM so it wasn't moved or something. It looks like an addition by fastcomp.

dylanmckay commented 7 years ago

Can confirm - the entire JavaScript backend lives inlib/Target/JSBackend/.

Every JS-specific modification to LLVM outside of this directory is decorated like so

@LOCALMOD-BEGIN
< .. js specific code .. >
@LOCALMOD-END

It looks like an addition by fastcomp.

It is

est31 commented 6 years ago

cc #45905

pepyakin commented 6 years ago

(continuing discussion from https://github.com/rust-lang/rust/pull/45905)

@RReverser

@pepyakin At least for Node.js side spawning process, net etc. are pretty easily implemented. Character for path can be taken from host system and so on. I think it makes sense to experiment with reusing Emscripten libraries for this as @badboy suggested.

But what should do Web version then? Should it implement fs, net, env, etc emulation in JS? Or should it trap? If we chose to implement emulation then we will need a JS shim library, which will call "real" APIs or emulated ones depending on which "sub-environment" we running on. But then, what if user wants to run only one "sub-environment"? Or user doesn't want either fs or net, or nothing at all? Emscripten solves this problem with preprocessing of JS files. Should we do our own preprocessing? But should we? Given that there is emscripten which do the job already?

Applications of wasm should be truly beyond the web. Fast, safe and deterministic execution seems to be pretty desired properties!

IoT,
plugins, scripts, and other embeddings in larger programs,
deterministic execution, especially for p2p (i.e. blockchain applications),
universal drivers,
mobile apps,
desktop and server apps,

Some of environments can implement complete std. However, others can't. These features can or cannot be supported:

processes,
filesystems,
TCP/UDP,
environment variables,
stdin, stdout
etc...

Even not all WASM features are always desired (due to performance reasons and/or need of deterministic execution):

threads 🦄,
gc 🦄,
simd 🦄,
floating points,
growing memory,

I think, this makes wasm to be more like ISA than end user platform. (Not to mention that we even have problems of combining JS/Web and JS/Node).

To properly handle all this zoo, I think, we need something like portability lint and maybe a few more triples, like this strawman:

wasm32-unknown-unknown - generic target that assumes nothing about it's environment. Suitable for the web.
wasm32-unknown-node - target specially for node. I think it might support all of the std?

Well other environments might provide own implementations of these system bindings if they need to.

It will make portability lint useless, isn't it?

badboy commented 6 years ago

I still think having wasm32-unknown-unknown providing the bare minimum (with a libstd where it makes sense) will get us a long way.

But what should do Web version then? Should it implement fs, net, env, etc emulation in JS? Or should it trap?

It should implement shims for these things.

If we chose to implement emulation then we will need a JS shim library, which will call "real" APIs or emulated ones depending on which "sub-environment" we running on.

Yes, we will need this shim. That's the same as it works in Emscripten today.

But then, what if user wants to run only one "sub-environment"?

If it is possible to modularize these different things, it would be up to the user to load the necessary things before loading the wasm module (again: I did not look into Emscripten's shims yet and how easy it would be to extract those things). This does not even have to be part of Rust itself btw. But I also don't expect every module out there needing or wanting to have this environment provided (JavaScript<->Wasm interaction is still the slower thing in the whole execution).

(I'm gonna read that portability lint RFC now)

pepyakin commented 6 years ago

I still think having wasm32-unknown-unknown providing the bare minimum (with a libstd where it makes sense) will get us a long way.

I agree on this too. But it seems we disagree on what "bare minimum" means : ) For me, "bare minimum" doesn't include fs, net, processes, rng, etc.

That's the same as it works in Emscripten today.

This does not even have to be part of Rust itself btw.

OK, here is my direct question then: why don't use Emscripten if one needs full-blown std library with support of Web, Node? 😃

steveklabnik commented 6 years ago

IMO, wasm32-unknown-unknown should have no shims. wasm32-unknown-web would be something more like emscripten, with shims for a bigger chunk of functionality.

non-web applications of wasm won't want the shims, and code that doesn't use the shims won't want the shims. Keep the minimum small, build from there, IMO.

SimonSapin commented 6 years ago

Rather than wasm32-unknown-unknown having an std crate with most of the APIs returning an "unsupported" error, could it not have std at all? If there’s functionality in std that is supported in "pure" wasm, isn’t that a sign that this functionality belongs in another crate like libcore or liballoc?

pepyakin commented 6 years ago

@SimonSapin yeah, maybe.

One thing that comes to my mind is threads.

WASM threads propolsal (it's still WIP though) suggests adding atomic.wake & atomic.wait operations, which roughly corresponds to thread::park/Condvar.

pepyakin commented 6 years ago

There is one more point: things that only could be implemented in terms of libcore AND liballoc automatically goes into libstd. For example, io::Read and io::Write. I can imagine usage of them in pure wasm context.

Also, if I squint enough, I can imagine implementation of HashMap outside of std.

SimonSapin commented 6 years ago

There is one more point: things that only could be implemented in terms of libcore AND liballoc automatically goes into libstd. For example, io::Read and io::Write.

liballoc depends on libcore, so I don’t think depending on both is a sufficient reason to have things in libstd. In the case of io::Read and io::Write it’s probably rather because they depend on io::Error, which depends on std::sys for OS errors.

As to HashMap, I believe the reason it’s not in liballoc is that its default hasher requires a pseudo-random number generator to be seeded from the OS.

NikVolf commented 6 years ago

HashMap being hasher agnostic and residing in liballoc would be helpful btw

pepyakin commented 6 years ago

liballoc depends on libcore, so I don’t think depending on both is a sufficient reason to have things in libstd. In the case of io::Read and io::Write it’s probably rather because they depend on io::Error, which depends on std::sys for OS errors.

Oh, I see. That's unfortunate.

As to HashMap, I believe the reason it’s not in liballoc is that its default hasher requires a pseudo-random number generator to be seeded from the OS.

Agree with @NikVolf on that

SimonSapin commented 6 years ago

I don’t know if a type item can specify a default type parameter (in the case of HashMap, S = RandomState) while still allowing it to be overridden.

RReverser commented 6 years ago

@simonsapin I imagine something like pub type HashMap<K, V, S = RandomState> = core::HashMap<K, V, S> in std would work (where core variant doesn't have any default).

SimonSapin commented 6 years ago

It would be alloc rather than core, but yeah that might work. Who feels like submitting a PR? :)

RReverser commented 6 years ago

@pepyakin

OK, here is my direct question then: why don't use Emscripten if one needs full-blown std library with support of Web, Node? 😃

For me, the reason not to use Emscripten is to avoid duplication and complexity in the building toolchain and not because of the libraries. The opposite, having emulation of native APIs that works on Node.js/Web has always been the most exciting part of Emscripten to me as it allows to write fully native apps and libraries and run them on different environments without any hassle.

pepyakin commented 6 years ago

For me, the reason not to use Emacripten is to avoid duplication and complexity in the building toolchain and not because of the libraries.

Can you elaborate on the duplication and complexity part?

RReverser commented 6 years ago

Can you elaborate on the duplication and complexity part?

It's about presence of LLVM in both Rust and Emscripten when you could just use Rust directly (and LLVM versions of both currently need to be kept compatible when you use Rust's -emscripten- targets).

Also, Emscripten evolves much slower and harder to contribute to in my experience, so when you need to implement something Rust-specific for the WASM/JS output, it might take much longer to try and implement it on Emscripten side than if we had everything done only on Rust side.

cretz commented 6 years ago

I'll drop my two cents here as an author of a non-web WASM backend: Emscripten and the idea that JS emul needs to be Rust's concern is off. The WASM community needs to get together and create a libc-esque set of abstractions that backends can support (like emscripten or mine). In the meantime, I'd really like to not see low-level impls for a specific backend (JS/Web) be part of the Rust lang. That the shim exists already is enough. If you want JS support, do it elsewhere external to the lang.

cretz commented 6 years ago

@chpio - Meh, the abstraction doesn't really matter. It can just be nix syscalls like emscripten does for all I care. Just WASM import what you expect and let the backend fill em in, but of course a consistent abstraction would be nice. But the real nice thing would be if all WASM frontends could define something they share, but that seems unlikely (Emscripten basically uses libc and other libs to assume this).

I just tossed my opinion in here that it should not be part of the Rust-lang repo (or even the core devs' concern) to implement JS forms of the stdlib. I haven't yet played w/ the unknown-unknown compilation target yet, but one way would be to have a separate wasm file emitted of all the abstracted functions and just have unreachable as the one and only command for all functions. Then just import that into the real wasm file...it can be supplanted by backend authors as need be.

RReverser commented 6 years ago

to implement JS forms of the stdlib

Yeah, obviously either way Rust won't know implementation details of JS side, so for what I care, it can be a separate npm package. What's important here is to have calls out, and they will be needed and same no matter what target you use (whether it's JS or non-JS).

chpio commented 6 years ago

https://www.tockos.org/blog/2017/crates-are-not-safe/#usage-of-the-standard-library-is-pervasive

Any dependency we would want to use for Tock needs to include #![no_std] so that the compiler does not try to include the standard library. Again we surveyed all of the crates on crates.io and found that 93% (11488/12360) of crates use the standard library (i.e. do not have #![no_std]). [...] However, when including required dependencies, the number increases to 97% (12023/12360).

alexcrichton commented 6 years ago

Hey all! Exciting to see all the current discussion on this! I figured I could help throw in my viewpoints as well.

Future of `wasm32-unknown-unknown`

Overall I'm quite happy with how this target is shaping up. You've got access to big chunks of the standard library (aka libcore + liballoc) and I think that with the upcoming proposals we'll be able to just start filling out the implementation of types like Condvar for example (as well as Mutex) using the various wasm instructions.

The biggest blocker I know of today to "big usage" is the fact that the compile time is dog slow and you get a bunch of LLVM asserts if you don't compile with optimizations. The "dog slow" compile times are because we force LTO on everyone (wheee!) and will get fixed with lld. The WebAssembly support of lld has been merged upstream into lld itself, but unfortuantely we don't get too far in using it today. Once we solve this problem though I think we'll be in great shape for the target itself.

More broadly I also feel somewhat uncomfortable precisely how the wasm target interacts with the outside world. For example extern { fn foo(); } is required to come from the env module, but presumably the env there should be configurable? There's various other bits and pieces too, but I'd in general recommend that anyone looking to use this right now is at least aware of these issues and how the target may get tweaked in the future wrt these precise integration/exposure details.

Future of a wasm libstd

I'm not particularly happy about all the errors returned from src/libstd/sys/wasm/*.rs. I think modules like std::{thread, fs, net} just shouldn't exist on the wasm target. Unfortunately though the primary enabler for this, the portability lint, is not current implemented. Additionally I'm not sure libstd is ready for such an organization (slicing and dicing) yet as well, but we can probably discuss more with the portability lint. In the meantime I think that "return errors everywhere" is the best solution we have today.

I'm also not particularly happy about modules like this one. That and how 1.0 % b may get lowered to fmod by LLVM. These sorts of functionality silently may require imports when instantiating a wasm module which is pretty unfortunate.

In general we, in the standard library, have no means of importing something that you as the instantiator don't have to worry about. For example when you say use std::thread you need not concern yourself that it imports a bunch of stuff. Similarly if we decided to implement f64::tan as a call to Math.tan in JS you ideally wouldn't have to worry yourself about that either!

Unfortunately though I don't think we have many options available to us. It sounds like one day wasm may be a full-fledged JS module citizen, but until then I think we need to keep our dependencies as slim as can be. That way users who instantiate a wasm module will need to provide as little as possible to instantiate it.

Wasm + node/web

I'd personally think that we can't really do anything here right now. Basically all due to what I mentioned above of how we can't import anything without actually forcing you to define it. That is, even if we knew we were executing in node, it's not like we have any new superpowers we could access!

Perhaps eventually when we can import functionality directly I do think two targets could make sense. For example a node target could make things like println! actually work by default (man that would be nice).

Wasm pipelines

One possible solution to this import problem may be some sort of standard pipeline though. I think that the standard pipeline for the near future will need to include wasm-gc if you care about binary size (as it strips compiler-rt goo that you don't need). I also think that wasm-opt from the Binaryen toolkit will also want to be a member of the standard pipeline as it can shrink code size even further after wasm-gc.

I wonder if maybe there's a pipeline or something like that for dealing with imports/exports? Like what if we could automatically hook up Math.random into HashMap::new() without you having to do anything? It'd be neat but I'm also not entirely sure that it'd be 100% possible unfortunately.

I'm curious if others have thoughts though!

alexcrichton commented 6 years ago

Oh right I should also say that I don't see the Emscripten targets going away any time soon. They're still incredibly useful if you're porting existing applications as you really do want all those emulation layers. Maybe like in a decade we can remove them if no one is using Emscripten, but for now I think Emscripten still has a solid enough niche that we should continue to support it as we do today.

jonhere commented 6 years ago

2 cents. Locked on env isn't a big deal IMO, just polish. it just means you can't as easily bulk import something say Math. old link (semi) on the issue. Being able to interact with webasembly table would give a small speed boost with dynamically assigned functions.

Threads will come in time, (hoping not too much.) My thought (not read anywhere) is only the main instance thread will be able to call imports.

I love that wasm32-unknow-unknown is minimal. In terms of getting println! and other libstd items: maybe a solution is expose a back-end Trait that programmers could supply their own structure to it, (copy default one / include crate.)

Debugging is a pain point. "Runtime Error: unreachable executed" this message should ideally be better. Stack trace: wasm-function[55]@ read this no idea if firefox would use it but can see no mention of debug in the wast.

RReverser commented 6 years ago

Like what if we could automatically hook up Math.random into HashMap::new() without you having to do anything? It'd be neat but I'm also not entirely sure that it'd be 100% possible unfortunately.

@alexcrichton Why not emit own JS in addition to just .wasm that would include needed imports? (similarly to Emscripten)

Then pure-wasm users will just provide own imports, but at least all the Web users of WebAssembly will be able to instantiate with correct JS counterparts without hassle.

alexcrichton commented 6 years ago

@jonhere compiling with debuginfo (-g) I think may help at least show more functions in the stack trace, but I agree that the debugging story isn't great.

@RReverser initially the new target was intended to be as bare as possible (no extra js fluff) but I think we'll go down the road at some point of emitting js glue which would give us a lot more flexibility of what to do and how to implement it. I'm just wary to go down that path too quickly!

rpjohnst commented 6 years ago

I would prefer wasm32-unknown-unknown to remain as bare-bones and freestanding as possible, even if that means not supporting the stdlib out of the box. Having a target like that is important, similar to the targets people use for kernel dev. (I suppose it would be sufficient to let people make their own target.json for that case, though.)

If we start emitting JS glue, it should probably be part of a separate target. wasm32-unknown-web, or something. That target could subset the stdlib (via the portability lint) to only what the web platform provides without any emscripten-like emulation. It would be good to keep the JS glue as minimal as possible here, with an eye toward being able to remove it if (when?) wasm allows direct access to the web platform without imports.

Concepts like "the system allocator" muddy the waters here somewhat, though- the web platform probably isn't ever going to provide that, so making a statically-linked one like dlmalloc-rs the default allocator for non-#[no_std] use of wasm32-unknown-unknown might be reasonable.

RReverser commented 6 years ago

I would prefer wasm32-unknown-unknown to remain as bare-bones and freestanding as possible, even if that means not supporting the stdlib out of the box.

What's the difference for users who don't care about stdlib? Right now, with "bare bones" approach it will emit errors using panic!s in Rust code, with implicit imports it will still emit errors, but only due to missing JS counterparts. At the same time, for Web users it will just work as soon as you import JS glue. I see it as win-win for both sides.

rpjohnst commented 6 years ago

The idea is to support code that doesn't have any implicit imports, without having to strip out a bunch of stuff. With the portability lint and/or no stdlib, this is totally doable and it's an important use case.

RReverser commented 6 years ago

sigh I really hoped that with this new target Rust would finally have a hassle-free support for WebAssembly on Web / Node.js - something Emscripten could never do due to really slow turnarounds and hard integration with JS, but it starts sounding like most people are against it and transparent integration of Rust + JS is still not happening 😢

SimonSapin commented 6 years ago

I don’t think anyone has argued that such integration should not exist at all, only that it should somehow not be mandatory. Maybe that means having two targets.

rpjohnst commented 6 years ago

@RReverser I am not arguing that at all! That support should exist and will certainly build on the work done for wasm32-unknown-unknown. It's just that the -unknown triple is a bad place to put platform-specific glue (i.e. web/node.js). wasm32-unknown-node and/or wasm32-unknown-web would be better places for that.

shepmaster commented 6 years ago

In the bikesheddy realm, I'd vote for wasm32-rust-node / wasm32-rust-web for such a target. 😄

est31 commented 6 years ago

As stdweb has gained support for the unknown target, I think we now can talk about the concrete proposal of using components from it from inside libstd. I've opened a thread on irlo.

jontro commented 6 years ago

But is really Math.* platform specific? I thought it was defined in the ES standards. As long as it's not specific to a specific target it should be fine for the unknown target?

Pauan commented 6 years ago

@jontro Wasm != JavaScript.

Wasm is specifically designed to work without JavaScript, so if you're compiling to wasm you can't assume that JavaScript APIs exist (because they might not exist!).

So, you can't use JavaScript APIs (including Math) in wasm32-unknown-unknown, but it's okay to use JavaScript APIs in wasm32-unknown-node and wasm32-unknown-web

est31 commented 6 years ago

What @Pauan said is also noted on the webassembly website:

Non-Web environments may include JavaScript VMs (e.g. node.js), however WebAssembly is also being designed to be capable of being executed without a JavaScript VM present.

So we should keep a target that doesn't rely on any js at all. wasm32-unknown-unknown seems like a good fit.

jonhere commented 6 years ago

(Not arguing about not keeping it minimal or anything.)

So, you can't use JavaScript APIs

You don't directly use them at the moment, so there is nothing to stop the target having more. WebAssembly defines the API for imports and exports. When you write your rust function args (bool, *const f64) the compiler converts them to WebAssemblies 4 data types. Similar happens with the javascript engine; (the bool (,etc.) rust sends becomes Number; you can send javascript Boolean to bool the other way.)

Pauan commented 6 years ago

@jonhere WebAssembly is not always running in a JavaScript environment. WebAssembly has been specifically and intentionally designed so that it is completely separate and unrelated from JavaScript.

There are already implementations for running WebAssembly in C, running WebAssembly on the JVM, and even running WebAssembly on your washing machine.

Obviously none of those implementations have the JavaScript APIs available. The JavaScript APIs simply do not exist at runtime at all, so no matter what Rust does, you cannot use the JavaScript APIs, period. You will get a runtime panic (at best).

RReverser commented 6 years ago

@Pauan But it really doesn't matter whether API bindings on the other side are provided by JS, C, Java or anything else. All that matters is they are imported on the Rust side (since there are already some required by LLVM anyway), and all these implementations are still free to provide own native bindings on the other side without taking any interop away from WASM.

rpjohnst commented 6 years ago

Again, since native wasm doesn't provide Math.*, -unknown targets should not use it. This is already how things work with other -unknown targets, both in Rust and in C.

If you want to add an import yourself and provide a Math.*-like function from Javascript, C, or anything else, that's fine, nobody's stopping you. But the compiler adding those imports automatically is a non-starter.

Edit: somewhat of an exception is "builtin" functions in C- compilers will generate calls to things like memcpy without you writing them in the source, even in freestanding mode on an -unknown target. However, this can be opted out of (-fno-builtin on GCC), and those functions can be provided from within C anyway.

CryZe commented 6 years ago

Yes, it certainly shouldn‘t be JavaScript APIs that it expects, however the std should instead instead define symbols it requires for all of the relevant APIs (Instant, stdin, stdout, ...). These symbols can then be provided by whatever is executing the WebAssembly and a JavaScript binding that automatically provides these symbols could be auto generated by the compiler.

Pauan commented 6 years ago

@RReverser If there is an API that can be provided by multiple targets, then the API can use conditional compilation so that it will work on multiple targets.

But even in that situation, it still won't work for the wasm32-unknown-unknown target, because the wasm32-unknown-unknown target is supposed to contain no platform-specific code.

When you compile your code with the wasm32-unknown-unknown target, you are saying "I want this code to work on all WebAssembly implementations". Therefore the only code that you can use is native wasm code, no platform-specific code.

Of course if you use a different target, such as wasm32-unknown-web, then you can use non-wasm bindings (such as JavaScript bindings), because your specified intention is different. In that case of course the compiler should automatically use the JavaScript APIs, so everything works smoothly and seamlessly.

@CryZe That's probably a good idea! That still won't apply to wasm32-unknown-unknown, but it would be useful for the other wasm32-* targets.

CryZe commented 6 years ago

What‘s the benefit of a largely broken std? The std should define a proper interface it communicates with. Only the subset of this interface that you actually use should be compiled in, so you have all the benefits of having std, while not getting random panics at runtime. If you truly never want to interact with an „OS“, then why even use std in the first place? That‘s what core is for.

RReverser commented 6 years ago

target is supposed to contain no platform-specific code

Problem is, it's not a platform-specific code, as you saw in the examples, such bindings are required even for some basic math operations. And from there it's not far to go to implementing std for those who compile against std as @CryZe says.

then you can use non-wasm bindings (such as JavaScript bindings)

As I said, they don't have to be JS bindings, they can be Java bindings, C bindings or anything else, so -web suffix doesn't make any sense as they're not Web- or generally platform-specific bindings in the first place. They're just basic "kernel functions" that Rust WASM would depend upon.

the std should instead instead define symbols it requires for all of the relevant APIs

This is a good idea IMO.

CryZe commented 6 years ago

Thinking about it, since this "kernel interface" is supposed to be somewhat generic, maybe it even makes sense for Rust to have this outside of the wasm target, as a flexible way to easily hook onto any kind of OS / runtime.

rust-lang / rust