tinygo-org / tinygo

Go compiler for small places. Microcontrollers, WebAssembly (WASM/WASI), and command-line tools. Based on LLVM.
https://tinygo.org
Other
15.26k stars 901 forks source link

Channels block forever in wasm #1091

Closed bradleypeabody closed 4 years ago

bradleypeabody commented 4 years ago

Channels do not appear to be working properly (both send and receive block forever) in wasm. Here's the specific info on the issue and then a more involved question below with which I'm trying to understand and decide what my next steps are for TinyGo+Vugu support.

Anything I've tried where main() is doing:

ch := make(chan bool, 1)
// do some JS setup
<-ch

And then some JS event happens and calls back into the Go code and a channel send i.e. ch <- true happens - the send blocks, and the the receive never resumes.

This reproduces it: https://github.com/bradleypeabody/tgtestb On master the channels block as described above. On dev it does not appear to arrive back into Go from JS at all, which is probably an unrelated problem.

The issue is very similar to this one: https://github.com/tinygo-org/tinygo/issues/791 - and I tried using the exact code from that issue and does not work (blocks forever, same as the example above).


With that said, I have two main questions:

  1. Is there already a good solution on the table for the "stack unwinding" problem in TinyGo+wasm? As I understand the problem (and I am by no means an expert on this so please correct me if I have this wrong), the Wasm instruction set does not have actual jump instructions like je, etc. but instead forces logic into condition blocks, and you can't manually save and restore the stack to implement these sorts of "and control transfers over to this context in this case" situations. (Although it certainly appears like there is active discussion about mechanisms that could affect this or improve the situation e.g. https://github.com/WebAssembly/design/issues/1340 https://github.com/WebAssembly/exception-handling/issues/105)

I believe in a previous discussion, Asyncify was mentioned as a possible solution.

If something like that isn't feasible, it might not even make sense to try to fully solve this until wasm provides some underlying suspend/resume mechanism that can be used for this (because that would make things much easier and saner and probably offload much of the work to LLVM). On the other hand, realistically the necessary Wasm features could take years to make their way all the way into desktop browsers.

The reason I'm bringing all this up is I'm trying to get an idea of how well supported these cases will be, at least for the time being.

  1. I can solve the issues I'm currently running into with Vugu by adding a path does not involve channels and instead is driven entirely by callbacks from the browser. From the perspective of the TinyGo team - should I proceed with this or should I wait for these blocking related issues to be resolved? My workaround would be instead of using channels like the above to signal that a re-render is to occur in main() after a DOM event is processed (like a button click), the render function would simply be called directly as the final step in the DOM event handling code. This removes the channel and I'm pretty sure it will work well for most Vugu programs.

I will run into a similar situation when figuring out how to support client http calls from Vugu programs. But, those can also be made to work by implementing them as function callbacks, and not requiring the "this code looks like it blocks but under the hood the Go runtime parks it somewhere and resumes once your response is ready" stuff. In practical terms it means performing an http request under TinyGo will not read like:

res, err := vhttp.Get("some-url")
// handle res and err

But instead like:

vhttp.Get("some-url", func(res *http.Response, err error) { ... } )

This is of course reminiscent of JS promises and would exist for essentially the same reason. I'm not a huge fan of it, but at the same time I do have to make a realistic call on how to proceed with Vugu. And I have no problem doing the above it's something that will exist for a few months or a year and allows things to move forward for the time being.

Sorry for the lengthy post and tucking this all together in one issue, but hopefully it helps provide the full context.

aykevl commented 4 years ago
  1. Is there already a good solution on the table for the "stack unwinding" problem in TinyGo+wasm?

Unfortunately, right now there is no good solution. What we do right now is use tons of hacks around C++ coroutines where we determine at compile time which functions might block, and if they do we convert that function and all parents to stackless coroutines. (This simplifies things, there are a few optimizations that make this a bit less invasive). This works in many cases, but apparently not all.

Right now there doesn't seem to be any concrete proposal for proper stack switching, just a bunch of idea (unlike garbage collection, which does have a concrete proposal but is still lying dormant). I don't think there will be a good stack switching solution any time soon. (By the way, thanks a lot for the links to those issues!).

Regarding the issue you have with Vugu, I suspect the issue is the interaction between concurrency and callbacks, not necessarily concurrency in itself (which appears to be working in many cases).

Honestly I don't know what the best way forward would be. Perhaps @jaddr2line has an idea? Maybe the fix is not all that difficult.

niaow commented 4 years ago

So, currently JS callbacks spawn goroutines because something in syscall/js decided to call select{} when it encounters an error.

Currently, when a coroutine function completes, it places the caller on the runqueue. In order for that to be resumed, something needs to call the scheduler. My current best guess is that whatever this is, it is not invoking the scheduler afterwards.

Re: coroutines being messy I am currently working on a mechanism which attempts to re-implement goroutines in a cleaner way to replace the current coroutines scheduler, however it has been taking a while since it is a very complex problem.

bradleypeabody commented 4 years ago

Thanks for the feedback, it makes sense on all of the above. Definitely sounds tricky. It seems like LLVM coroutines attempt to address a key part of the problem (moving data out from the stack to the heap and using a condition/switch to skip back to a specific resume point). But I get that there is a really big gap between what coroutines do and the what is required to implement the various Go language mechanisms involved with the scheduler and the impact on how various situations with functions are lowered.

@jaddr2line If you can hit me up when there is an update I should try or any other prediction on when you're able to work on this, it would be most appreciated.

Otherwise I'll give this some thought on the Vugu side of things and if no other solution becomes apparent in the coming weeks I'll see if there is an elegant way to add in a call path for Vugu rendering (and later the http Get stuff mentioned above) that is just straight function calls with no channels and roll with that for now. I can support method signatures for both approaches (channels vs straight function calls) and just use build tags to enable/disable stuff - so once the scheduling is nailed it will be a trivial change to switch back over.

niaow commented 4 years ago

If you can hit me up when there is an update I should try or any other prediction on when you're able to work on this, it would be most appreciated.

I am kinda busy right now and do not think I will have time for a while. Can you add a call to scheduler() after this goroutine start and see if it fixes the problem: https://github.com/tinygo-org/tinygo/blob/a9ba6ebad9ad85cacd9674a1af9229489a811f68/src/runtime/runtime_wasm.go#L58

bradleypeabody commented 4 years ago

Awesome! Yes, that worked like a charm. I've made a PR in case that's helpful. This resolves the immediate problem, and I just tested some simple click events that toggle elements on the page and perform the associated Vugu rendering loop and so far everything is working perfectly. Thanks for the help on this. I'll post an update once I've tried out some of the more complex cases to see if they are working.

deadprogram commented 4 years ago

This was released with v0.14.0 so now closing. Please reopen if needed. Thanks!