pkamenarsky / concur-replica

Server-side VDOM UI framework for Concur
BSD 3-Clause "New" or "Revised" License
139 stars 20 forks source link

Website example with path based back button #29

Closed seagreen closed 4 years ago

seagreen commented 4 years ago

@pkamenarsky I'm stuck on getting URL path based routing working.

If you're interested, here are the reproduction steps:

  1. Run the app
  2. Click a few times to see the URL advance. You'll see pushing state in your terminal each time.
  3. Click the back button. You'll see popstate callback complete in the browser console, but won't see the expected popstate callback received in the terminal.

I've seen an indication this might be a race condition, because once (with location.pathname switched out for location.hash to be more like your original example) I did see "popstate callback recieved", though I haven't been able to reproduce it.

seagreen commented 4 years ago

Hold up: I may have it figured out...

seagreen commented 4 years ago

I haven't figured it out, but by adding the threadDelay I was able to intermittently get it working, maybe 1/4 of the time or less. So something is weird.

pkamenarsky commented 4 years ago

Interesting, do you encounter the same problem when you run the original routing example?

seagreen commented 4 years ago

Nope, the routing example works great.

seagreen commented 4 years ago

Added a duplicate channel for logging to the terminal.

Also I learned that by going into the Network tab on Chrome and clicking the websocket, you can see everything that's sent over it.

Between these two things I can now see the problem more clearly:

Both times you click the back button the frontend invokes the callback and sends a message over the websocket. The message makes it to the channel and is printed by the logging dupchan, but "popstate callback received" isn't printed and the text doesn't change.

seagreen commented 4 years ago

orr is eating my channel events! Dastardly!

pkamenarsky commented 4 years ago

orr is eating my channel events! Dastardly!

That's not good :( If that's indeed a concur bug we should try and narrow it down before submitting a bug report upstream. I'm wondering why the other routing example works though.

seagreen commented 4 years ago

I'm very confused about why the routing example works as well.

You can see it happening though in this example if you replace liftIO (readChan chan) with liftIO (readChan chan <* writeFile "test" "test"; click the text twice; and then click the back button. The message is sent to the server, the test file appears, but no message is sent back to the frontend.

seagreen commented 4 years ago

Check it out :open_mouth:

pkamenarsky commented 4 years ago

I was able to reproduce this. Yeah, really strange :/ I'll dig into it.

seagreen commented 4 years ago

Reproduced it again by accident while working on my 2d example: https://github.com/seagreen/concur-replica-example-2d/commit/8227ca10e6c775c3749ca3f042501ddf5c5f070d

pkamenarsky commented 4 years ago

If you replace liftIO with liftSafeBlockingIO in this line:

r <- orr [ Left <$> f a, Right <$> liftIO (wait res) ]

it should work. There are some subtle MVar interactions deep in the concur-core guts which somehow deadlock in certain situations. Needs more digging.

pkamenarsky commented 4 years ago

Found the problem: liftIO creates its own thread in which it runs the IO action (which is a leftover from the previous STM implementation of concur). However, whenever a thread wins the orr race and subsequently all other threads are killed, the inner liftIO thread is left untouched. Those zombie threads are then left consuming the Chan, and if the scheduling is unfortunate, will swallow incoming messages.

This must also be the reason why the async solution worked - either async kills all spawned children threads, or the scheduling is somehow affected and newer threads consume the message first. I assume it's the latter, since the other routing example has exactly the same problem but seems to work.

Should be fixed in https://github.com/pkamenarsky/concur/commit/e06c121d8cb506625a7a3ac5d6798e3185de424f. Keep in mind, all such theories are actually nothing more than somewhat educated guesses when it comes to multithreading, i.e. I'd place this theory barely above flat earth in terms of legitimacy :D

EDIT: concur-replica commit: 7192dd5a49feb90996c14f80aa67e9d4df4bf4a1.

seagreen commented 4 years ago

I'd place this theory barely above flat earth in terms of legitimacy :D

Haha, awesome.

I do have a concern with the current concur implementation: as long as it's not based on STM it seems like there will always be race conditions, right? Eg if I do left <$> reacChan chan1 <|> Right <$> readChan chan2, there's a chance that both the readChans could happen, but only one value gets returned.

seagreen commented 4 years ago

Basically: I'm curious why concur switched away from STM-- any chance you know?

pkamenarsky commented 4 years ago

Basically: I'm curious why concur switched away from STM-- any chance you know?

This will give you a bit of context: https://github.com/ajnsit/concur/pull/16. Basically - stm is slow and there was a GHC bug that made concur segfault (!) with certain configurations.

Re race conditions, you are probably right. It should be possible to revert to the old STM implementation if the GHC bug fix has been backported to all current GHC versions, but this needs some thought.

pkamenarsky commented 4 years ago

Ok, let's see.

https://github.com/pkamenarsky/concur/commit/dc5347b35c79654d58fa95716f425ee7248504fb introduces liftSTM and swaps the internal MVar for a TMVar.

https://github.com/pkamenarsky/concur-replica/commit/267cbf35c92262c4506cfd88046fd7a51820f04c is the corresponding concur-replica commit.

This should work as expected now:

liftSTM (Left <$> readTChan chan1 <|> Right <$> readTChan chan2)
seagreen commented 4 years ago

Excellent! I've made a new PR based off of a rebased example: https://github.com/pkamenarsky/concur-replica/pull/31

pkamenarsky commented 4 years ago

Excellent! I've made a new PR based off of a rebased example: #31

Why not rebase this one here and force push? This way the discussion won't be lost.

seagreen commented 4 years ago

Why not rebase this one here and force push? This way the discussion won't be lost.

Sure thing! Done.

Now that this example supports the back button the next thing I'm going to add to it is actual routing, but I'm planning to do that in a later PR.