phoenixframework / phoenix

Peace of mind from prototype to production
https://www.phoenixframework.org
MIT License
21.45k stars 2.88k forks source link

LiveView falls back to Longpolling too often #5720

Closed PJUllrich closed 9 months ago

PJUllrich commented 9 months ago

Environment

Elixir: 1.15.7
Erlang: 26.1.2
Phoenix: 1.7.11
LiveView: 0.20.5
Bandit: 1.2.0
macOS: Sonoma 14.3

Actual behavior

Ever since I upgraded to Phoenix 1.7.11 and enabled longpolling by default, LiveView falls back to longpolling 60-90% of the time instead of opening a WebSocket. I experienced this with my production servers running on Fly, but also in Localhost. I updated the longPollFallbackMs option from the recommended 2500 to 5000, 10_000, and even 15_000, but could observe no change in behaviour. Previously, I had no issues with establishing a WebSocket connection in any of my environments and if I disable longpolling by removing the longPollFallbackMs option, it establishes a WebSocket connection without issues.

I also experienced an issue with Bandit, which might be related. Please see the error and my configuration here: https://github.com/mtrudel/bandit/issues/299

Expected behavior

I expect LiveView to fall back to longpolling almost never, especially not in Localhost.

chrismccord commented 9 months ago

It sounds like what is happening is in development a new request triggers the phoenix CodeReloader plug which compiles project changes, and those changes take longer than 2.5s, which triggers the longpolll fallback. Note that longpoll fallback is memoized in sessionStorage on the client for the remainder of the browser session, so that is my guess on why increasing it to seemingly very high numbers had no effect - it was already memoized to fallback. One option could be to check the window.pathname for localhost and conditionally fallback so you don't fallback in dev, but I don't think this is something we need to change internally. Sounds like everything is working as intended. You can verify by clearing your session storage or restarting the browser and seeing that it does not fallback with the high values. Thanks!

mtrudel commented 9 months ago

@PJUllrich for the sake of completeness, can you verify that this also happens on Cowboy, all else being equal?

PJUllrich commented 9 months ago

@mtrudel ha, that's interesting. I mentioned that I saw longpolling on localhost which surprised me. I switched out Bandit for Cowboy 2.10.0 and the issue was gone! I enabled Bandit 1.2.1 again, and it fell back to longpolling again. Once more, I switched to Cowboy and saw longpolling probably because the phx:longpolling flag in sessionStorage was set to true. I deleted that flag, refreshed and had a WebSocket connection again. Curious.

mtrudel commented 9 months ago

@PJUllrich so does it relate to the server being run or not?

PJUllrich commented 9 months ago

I'm sorry I don't fully understand your question. But if you mean whether the issue is related to Bandit vs Cowboy, my suspicion would be yes, it might

mtrudel commented 9 months ago

That's what I was getting at; thanks!

I can't see where this would be the case but I'll look at it this morning!

mtrudel commented 9 months ago

Just spun up a new 1.7.11 test app and everything seems to work as expected on Bandit. A couple of things to try and diagnose:

chrismccord commented 9 months ago

Are you using Brave by chance?

PJUllrich commented 9 months ago

Not on my laptop right now but I experience the issue in the latest Arc on macOS and latest Safari in iOS. Know sure whether it helps but I experience this issue on a multitude of LiveView routes both locally and in production (Fly)

Maybe unrelated, but another issue I experience is that the websocket connection never completes on the connected load. So, the static page renders and then the mount/3 and handle_params/3 callback execute for a second time but the browser never receives that update, so never fully establishes a websocket connection. The loading bar just keeps on loading until the timeout is reached and the website refreshes. I think that's why I experience so many longpoll fallbacks.

I saw some errors when I connected my iPhone to my laptop and opened the iOS Safari console there. I can send you a screenshot in a few hours. In Arc, I didn't see any logs unfortunately.

PJUllrich commented 9 months ago

I also disabled any ad blockers. Same issue. Since I switched back to Cowboy, all seems to work fine but I haven't tested it thoroughly yet.

mtrudel commented 9 months ago

Maybe unrelated, but another issue I experience is that the websocket connection never completes on the connected load. So, the static page renders and then the mount/3 and handle_params/3 callback execute for a second time but the browser never receives that update, so never fully establishes a websocket connection. The loading bar just keeps on loading until the timeout is reached and the website refreshes. I think that's why I experience so many longpoll fallbacks.

If it's possible, I'd love to see a screenshot of the websocket request in your web inspector's Network tab. It's possible this is more related to how Bandit's websock implementation calls in to Phoenix's Socket layer, rather than a protocol level websocket issue.

PJUllrich commented 9 months ago

@mtrudel I'm sorry to have wasted your time on this, but I tried to replicate this issue today and I just don't see it anymore which is weird. I tried it on Safari and Arc on macOS and Safari on iOS but nothing shows up. I've seen some errors in Safari on macOS when I navigated from a LiveView to a Controller route and back, but I don't think it's related. I did a forced recompile of my project maybe that resolved it? It's weird because I've seen WebSocket time outs also in production, so that rules it out, but I just don't see the issue anymore.

I'll keep an eye on this and will ping you here if I see the websocket timeout issues again, but I think we can move on for now.

These were the errors that I've seen on Safari 17.3 on macOS Sonoma 14.3, but I think that's Safari related and not Bandit related.

Navigating from a LiveView route to a Controller route CleanShot 2024-02-14 at 20 51 18@2x

Navigating from a Controller route to a LiveView route CleanShot 2024-02-14 at 20 25 39@2x

The errors for the websocket ending with vsn=2.0.0 are related to the live-reloading, so no issue with Bandit or LiveView. I also see these errors with Cowboy. There's only one error related to the "real" websocket in the first picture when I navigate away from a LiveView with a <.link navigate={~p"/pricing"}> link. But I doubt that that's causing the issue that I've reported.

Again, sorry for wasting your time on this. I'll ping you if I see it again.

mtrudel commented 9 months ago

Good(ish?) to hear. If it does come up again, the most useful thing to see would be the relevant network views from your inspector.

walter commented 8 months ago

Thanks for your work on Longpoll fallback and Bandit.

I'm back-enabling Longpoll fallback for an existing app and I'm seeing something similar in dev with Chrome and Safari (cycling failures to connect when longPollFallbackMs is declared). I mention this not to get issue re-opened as I'm guessing it very specific to my app, but as a data point.

I also switched from Cowboy to Bandit. I haven't switched back to see if that fixes things yet.

It's the end of my work week, so won't be able to give details until my Monday, but will try to follow up then.

walter commented 8 months ago

I mention this not to get issue re-opened as I'm guessing it very specific to my app, but as a data point.

It's the end of my work week, so won't be able to give details until my Monday, but will try to follow up then.

This is mainly for contextual reference for anyone that comes across this.

In the end, the issue that has persisted for me doesn't seem to be related to LongPoll fallback. The symptoms just happened to appear when I went through the upgrade process for Phoenix and LV and cropped up when I went to check out that things were working.

Chrome and Firefox on a Mac work as expected now. Chrome came right pretty quickly after I cleared browser data and made sure my elixir build and js build were in order.

However, Safari 17.3.1 with 1Blocker enabled (tried with it paused and then extensions turned fully off) does not like my app's login form page (fine in other browsers) and keeps reconnecting endlessly (over web sockets, with Bandit or Cowboy) with this in the JS log:

[Log] push: lv:phx-F7lYsoPqFz6j_2NB phx_join (4, 4)  {redirect: undefined, url: "http://localhost:4000/login", params: Object, &} (app.js, line 32650)
{redirect: undefined, url: "http://localhost:4000/login", params: Object, session: "...", static: "...", &}Object
[Error] no component found matching phx-target of 1  undefined
    logError (app.js:33162)
    withinTargets (app.js:36035)
    exec_push (app.js:33382)
    (anonymous function) (app.js:33349)
    forEach
    (anonymous function) (app.js:33348)
    forEach
    exec (app.js:33343)
    (anonymous function) (app.js:37439)
    (anonymous function) (app.js:37506)
    owner (app.js:37334)
    withinOwners (app.js:37338)
    (anonymous function) (app.js:37505)
    (anonymous function) (app.js:37795)
    debounce (app.js:33767)
    (anonymous function) (app.js:37794)
    owner (app.js:37334)
    withinOwners (app.js:37338)
    debounce (app.js:37792)
    (anonymous function) (app.js:37504)
    (anonymous function) (app.js:37807)
    focus
    focusFormForStreamlinedLogin
[Log] transport: WebSocket connected to ws://localhost:4000/live/websocket?_csrf_token=...&_track_static%B0%D=http%A%F%Flocalhost%A4000%Fassets%Fapp.css&_track_static%B1%D=http%A%F%Flocalhost%A4000%Fassets%Fapp.js&_mounts=0&_live_referer=undefined&vsn=2.0.0  undefined (app.js, line 32650)
[Log] push: phoenix heartbeat (undefined, 10)  {} (app.js, line 32650)
[Log] receive: error lv:phx-F7lYsoPqFz6j_2NB phx_reply (4)  {status: "error", response: {reason: "stale"}} (app.js, line 32650)
[Log] phx-F7lYsoPqFz6j_2NB error: unauthorized live_redirect. Falling back to page request -   {reason: "stale"} (app.js, line 33180)
[Log] phx-F7lYsoPqFz6j_2NB socket: disconnect for page nav -   undefined (app.js, line 33180)
[Log] phx-F7lYsoPqFz6j_2NB destroyed: the child has been removed from the parent -   undefined (app.js, line 33180)
[Log] push: lv:phx-F7lYsoPqFz6j_2NB phx_leave (4, 12)  {} (app.js, line 32650)
[Log] channel: leave lv:phx-F7lYsoPqFz6j_2NB  undefined (app.js, line 32650)
[Log] channel: close lv:phx-F7lYsoPqFz6j_2NB 4  undefined (app.js, line 32650)
[Log]  destroyed: the child has been removed from the parent -   undefined (app.js, line 33180)
[Log] push: lv: phx_leave (undefined, 13)  {} (app.js, line 32650)
[Log] channel: leave lv:  undefined (app.js, line 32650)
[Log] channel: close lv: undefined  undefined (app.js, line 32650)
[Log] receive: ok phoenix phx_reply (10)  {status: "ok", response: {}} (app.js, line 32650)
[Log] transport: connected to primary after  1 (app.js, line 32650)
[Log] receive: ok lv:phx-F7lYsoPqFz6j_2NB phx_reply (12)  {status: "ok", response: {}} (app.js, line 32650)
[Log] receive: ok lv: phx_reply (13)  {status: "ok", response: {}} (app.js, line 32650)
[Log] transport: close  CloseEvent {isTrusted: true, wasClean: true, code: 1000, &} (app.js, line 32650)
CloseEvent {isTrusted: true, wasClean: true, code: 1000, reason: "", type: "close", &}CloseEvent

I tried Safari with an empty generated app and it was fine.

Any insight appreciated, but it's definitely something specific to my app not a bug in LongPoll support.