Closed seagreen closed 4 years ago
Hold up: I may have it figured out...
I haven't figured it out, but by adding the threadDelay
I was able to intermittently get it working, maybe 1/4 of the time or less. So something is weird.
Interesting, do you encounter the same problem when you run the original routing example?
Nope, the routing example works great.
Added a duplicate channel for logging to the terminal.
Also I learned that by going into the Network tab on Chrome and clicking the websocket, you can see everything that's sent over it.
Between these two things I can now see the problem more clearly:
Both times you click the back button the frontend invokes the callback and sends a message over the websocket. The message makes it to the channel and is printed by the logging dupchan, but "popstate callback received" isn't printed and the text doesn't change.
orr
is eating my channel events! Dastardly!
orr
is eating my channel events! Dastardly!
That's not good :( If that's indeed a concur
bug we should try and narrow it down before submitting a bug report upstream. I'm wondering why the other routing example works though.
I'm very confused about why the routing example works as well.
You can see it happening though in this example if you replace liftIO (readChan chan)
with liftIO (readChan chan <* writeFile "test" "test"
; click the text twice; and then click the back button. The message is sent to the server, the test
file appears, but no message is sent back to the frontend.
Check it out :open_mouth:
I was able to reproduce this. Yeah, really strange :/ I'll dig into it.
Reproduced it again by accident while working on my 2d example: https://github.com/seagreen/concur-replica-example-2d/commit/8227ca10e6c775c3749ca3f042501ddf5c5f070d
If you replace liftIO
with liftSafeBlockingIO
in this line:
r <- orr [ Left <$> f a, Right <$> liftIO (wait res) ]
it should work. There are some subtle MVar
interactions deep in the concur-core
guts which somehow deadlock in certain situations. Needs more digging.
Found the problem: liftIO
creates its own thread in which it runs the IO action (which is a leftover from the previous STM implementation of concur
). However, whenever a thread wins the orr
race and subsequently all other threads are killed, the inner liftIO
thread is left untouched. Those zombie threads are then left consuming the Chan
, and if the scheduling is unfortunate, will swallow incoming messages.
This must also be the reason why the async
solution worked - either async
kills all spawned children threads, or the scheduling is somehow affected and newer threads consume the message first. I assume it's the latter, since the other routing example has exactly the same problem but seems to work.
Should be fixed in https://github.com/pkamenarsky/concur/commit/e06c121d8cb506625a7a3ac5d6798e3185de424f. Keep in mind, all such theories are actually nothing more than somewhat educated guesses when it comes to multithreading, i.e. I'd place this theory barely above flat earth in terms of legitimacy :D
EDIT: concur-replica
commit: 7192dd5a49feb90996c14f80aa67e9d4df4bf4a1.
I'd place this theory barely above flat earth in terms of legitimacy :D
Haha, awesome.
I do have a concern with the current concur
implementation: as long as it's not based on STM it seems like there will always be race conditions, right? Eg if I do left <$> reacChan chan1 <|> Right <$> readChan chan2
, there's a chance that both the readChans could happen, but only one value gets returned.
Basically: I'm curious why concur
switched away from STM-- any chance you know?
Basically: I'm curious why concur switched away from STM-- any chance you know?
This will give you a bit of context: https://github.com/ajnsit/concur/pull/16. Basically - stm
is slow and there was a GHC bug that made concur
segfault (!) with certain configurations.
Re race conditions, you are probably right. It should be possible to revert to the old STM implementation if the GHC bug fix has been backported to all current GHC versions, but this needs some thought.
Ok, let's see.
https://github.com/pkamenarsky/concur/commit/dc5347b35c79654d58fa95716f425ee7248504fb introduces liftSTM
and swaps the internal MVar
for a TMVar
.
https://github.com/pkamenarsky/concur-replica/commit/267cbf35c92262c4506cfd88046fd7a51820f04c is the corresponding concur-replica
commit.
This should work as expected now:
liftSTM (Left <$> readTChan chan1 <|> Right <$> readTChan chan2)
Excellent! I've made a new PR based off of a rebased example: https://github.com/pkamenarsky/concur-replica/pull/31
Excellent! I've made a new PR based off of a rebased example: #31
Why not rebase this one here and force push? This way the discussion won't be lost.
Why not rebase this one here and force push? This way the discussion won't be lost.
Sure thing! Done.
Now that this example supports the back button the next thing I'm going to add to it is actual routing, but I'm planning to do that in a later PR.
@pkamenarsky I'm stuck on getting URL path based routing working.
If you're interested, here are the reproduction steps:
pushing state
in your terminal each time.popstate callback complete
in the browser console, but won't see the expectedpopstate callback received
in the terminal.I've seen an indication this might be a race condition, because once (with
location.pathname
switched out forlocation.hash
to be more like your original example) I did see "popstate callback recieved", though I haven't been able to reproduce it.