Closed lionel- closed 2 years ago
ComLink seems like a nice way of setting up message-passing between workers, using async functions: https://github.com/GoogleChromeLabs/comlink.
This would involve a bit of refactoring to make webR into a class that can be wrapped by comlink.
This would involve a bit of refactoring to make webR into a class that can be wrapped by comlink.
@georgestagg What do you think about taking this opportunity to define the class within a typescript module? Winston recommended to use typescript to benefit from type checking and refactoring tools. I've experimented with it and the language is nice and the LSP features do make a difference.
Restructuring the webR module and moving to typescript sounds like a good idea. It will require a large rewrite but will bring us closer to how Pyodide does things.
Comlink looks interesting! I actually don't mind the message-passing with workers so much, but something like comlink might become more useful later on when we're having to do more complicated stuff like pass backing buffers around for things like the plotting canvas.
If I can get a good chunk of time this week, I'll remind myself of how the Pyodide typescript modules are set up and see if I can make a start on refactoring webR.js
to be built and work in a similar way. Possibly we will be able to use very similar (if not identical) code for the initial setup of the environment before R starts running.
In Safari I'm getting this error with web workers:
RangeError: Maximum call stack size exceeded.
It looks like this is a known issue: https://github.com/pyodide/pyodide/issues/441#issuecomment-678140870
So I think we'll have to make using web workers configurable until we figure this out.
It happens when ASYNCIFY is turned on too (#8). I also saw the same thing when I was experimenting with workers a while ago.
I think I saw a post once that did some experiments and found Safari's JS call stack is smaller than on Chrome or Firefox.
Thanks for the info. I've just found out that the wasm actually loads fine in this case and the overflow occurs when making an RPC call with Comlink.
It isn't Comlink that is causing the overflow, it only appeared so because of the deterministic way the dispatched events / control flow lined up.
It looks like this is the XHR emscripten fetch logic that causes the overflow in Safari web workers. So I thought we could take this opportunity to improve the way we get inputs from stdin
by blocking in the worker thread.
We can now do that using Atomics.wait()
which is now supported by all main browsers (Safari started supported it in December 2021). And.... it turns out that a Pyodide developer forked Comlink to implement support for blocking calls via Atomics! https://github.com/hoodmane/synclink
Edit: Looks like this is a very active topic on the Pyodide side: https://discuss.python.org/t/wasm-python-synchronous-io-working-group/15509
Edit2: Since these APIs are in development, I suggest that for now we keep using Comlink and use Atomics manually, it doesn't look complicated. Then later on we can switch to the infrastructure the Pyodide devs have settled on once it's stable.
It looks like this is the XHR emscripten fetch logic that causes the overflow in Safari web workers.
I've been working on removing the XHR fetching in the R patches, I think I have something working here: https://github.com/georgestagg/webR/commit/f9baacb3ab6238285f1e14a9a18f03b3de1f544f
EDIT: I don't know how compatible this will be with your own ideas for blocking with stdin
, though.
IIUC with the blocking approach we should be able to remove most or all modifications to the REPL code, as well as the XHR fetching stopgap. Also there should be no need to hook into the Emscripten main loop. I'm currently experimenting in this direction.
Actually if we block the worker thread instead of yielding back to the event loop, we can no longer receive message
events from the main thread. Instead we'd have to implement our own event loop based on the Atomics
primitives. This is exactly what synclink does. Instead of reimplementing that (even if that does sound fun), I think we should just go ahead and use synclink.
One advantage of running our own event loop is that we can run it at any safe point of the program. I'm thinking about nested REPLs in particular, as created by browser()
or utils::menu()
.
The Atomics approach will require us to serve webR with these http headers:
Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp
If that proves to be limiting, we can use a service worker + blocking XHR approach instead, as implemented in https://github.com/alexmojaki/comsync (a fork of synclink).
Now that I have a good grasp of how synclink works, I think I should be able to finish this on Tuesday or Wednesday. But please go ahead and send your typescript refactors if you'd like @georgestagg, I'll rebase my changes. I think we could also merge #17 before you do that if you like, it's probably ok for the dev version to momentarily not work on Safari.
Okay. I'll merge #26 later today. I'm going to add my typescript refactoring on top of that and then resubmit as a separate PR for you to rebase on.
I think the web worker & comlink changes might break plotting. We can't render to a canvas from a web worker because Safari doesn't support OffscreenCanvas yet. If I have time I'm going to try reworking things so that rather than rendering directly in the R graphics device we send messages through the queue system. That way it can be up to the app to decide what to do with them and we can just render to a Canvas on the main thread until OffscreenCanvas becomes more widely available.
oh yes I haven't looked into plotting. Good idea regarding passing plots as messages through the output queue.
I've merged in my TypeScript refactoring into main
now. Thanks again for your suggestions.
Interestingly I removed the XHR fetch from stdin
with these changes, but the Safari call stack issue is still present with the same error message. As an experiment, I artificially reduced the call stack by adding an Rprofile.site
to the virtual file system containing options(expressions=25)
. With this, the default packages fail to load due to the now limited number of allowed nested calls, but stdin
and stdout
seems to work OK. I can run simple statements like x=5;x
in the REPL and get back a result. That suggests the issue is somewhere else.
I do wonder if Webkit has some kind of bug meaning Web Workers or WASM get less stack than intended. We have seen Pyodide hit the same problem with Safari and my experiments suggest that just options(expressions=75)
is enough depth to break things, the default is 5000. Perhaps one day I should try building the latest version of https://github.com/WebKit/WebKit and tweaking some values...
If it continues to be an issue we should add a switch in the REPL app for choosing between using the worker via Comlink or just importing from webR.ts
directly.
Thanks for investigating. I was worried it would come to this.
That's a bummer because I don't think we can easily support both blocking on stdin and yielding to the main thread in a nice abstracted way. I'll think more about it (once I'm done with some R maintenance work).
I built a custom version of WebKit today that allocated 8MB to worker threads, rather than the default in macOS of 512KB, and that fixed the crash. That suggests that a small call stack on worker threads does indeed cause the problem.
I checked the source code for Chromium and there they explicitly override the default on macOS and allocate more stack, citing deep recursive threads as a problem, just like our R eval()
function.
It looks like WebKit supports launching threads with a larger stack by setting a define, but I think Safari is compiled to just stick with the macOS default of 512KB. It's possible that the WebKit people aren't aware of such problems caused by a 512KB stack (or are aware but not as worried about it as the Chromium team), so in theory we could submit a bug report asking very nicely if they'd turn it up on Darwin by default, citing what Chromium has done as precedent.
In the meantime, I had a hunch that bcEval()
might be what is causing us to use so much stack. Some analysis tools from emscripten told me that certainly it is the biggest function in the webR binary by a long way, and I suspect that it is allocating a lot of local variables. I tweaked the R patches (see #29) to always just skip it (falling back to eval()
), and after building with that change webR in a worker using comlink seems to work OK in Safari.
I'm going to do a full rebuild to confirm that, but the signs seem to be good!
Awesome! Thanks for unblocking us on this front. For the longer term, I'll look into allocating the array for the Emscripten stack memory on the main thread, and share it to the worker. Then we can remove the bcEval()
patch and we'll be able to run stack-intensive R functions.
Do you think it's worth filing an issue upstream to increase Safari's stack memory in worker threads?
I'll look into allocating the array for the Emscripten stack memory on the main thread, and share it to the worker.
Is that possible? From my understanding it's a limitation in the actual stack allocated to the thread at the native OS level, rather than a limit within the JS/WASM virtual machine.
Then we can remove the bcEval() patch and we'll be able to run stack-intensive R functions.
I'm not too worried about the bcEval
patch, we already apply heaver patches and there's a R_disable_bytecode
variable nearby that also does a similar thing. IIRC I did some experiments a while back and I found byte compiling some R code to actually be a little slower in emscripten anyway. I think with all the JS/WASM indirection already going on adding another layer of bytecode doesn't really help.
Do you think it's worth filing an issue upstream to increase Safari's stack memory in worker threads?
Possibly, but let's leave it for the moment. We can always reconsider asking later.
I get about 200 recursions using x = function(i){print(i); x(i+1)};x(0)
compared to about 400 in Chrome, so behaviour is now closer between the browsers. Probably at this point its better to work on reducing our own frame size than pestering upstream.
I think once I'm confident enough to merge #29 I'll close this issue, update the live version of the webR REPL, and open one or two new issues to keep track of any remaining points raised in this thread.
IIRC I did some experiments a while back and I found byte compiling some R code to actually be a little slower in emscripten anyway. I think with all the JS/WASM indirection already going on adding another layer of bytecode doesn't really help.
Interesting. Apparently wasm is structured in a quite peculiar way (AST-based register machine instead of SSA-based stack machine), maybe that doesn't interact well with the bytecode interpreter.
Is that possible? From my understanding it's a limitation in the actual stack allocated to the thread at the native OS level, rather than a limit within the JS/WASM virtual machine.
hmm yes you're probably right. I was thinking about the stack that Emscripten manages in the linear memory to store stack values (IIUC it has to because stack variables can't have their address taken in wasm). But the stack overflow must be about the actual wasm stack for activated function calls.
Currently the webR UI blocks until the current R command finishes. This makes webR less comfortable to use and prevents printf-debugging in case of crashes or freezes.
I've ported a trimmed down version of the webR app to a web worker as a POC and it seemed to work fine: https://github.com/lionel-/webr-build/commit/143630525bde5317137b89c8c5de0aeca64a568c. I'll look into porting this work to the main repo so that we can more easily printf-debug.