observablehq / feedback

Customer submitted bugs and feature requests
42 stars 3 forks source link

Socket sometimes doesn't reconnect? #475

Closed visnup closed 1 year ago

visnup commented 2 years ago

Describe the bug Admittedly a bit nebulous of a bug report, but…

It seems like there were times while investigating connectivity issues with @saneef and @mootari when the notebook could have reconnected, but seemed to have given up. While this state may be valid (internally this could map to the terminal aborted state), it was unclear if we were in it and what (if any) actions the user should take. For example, a toast or dialog explaining the current state and possible actions to remedy, like reloading the page, would be helpful.

To Reproduce This has been difficult to reliably reproduce for me. I know this happens to @saneef regularly (daily), but I haven't been able to see it myself.

Expected behavior If this does map to a terminal state where the socket isn't going to attempt anymore reconnections, we could intervene with a toast, dialog, or something else.

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

Additional context Add any other context about the problem here.

saneef commented 1 year ago

I'm still living with this. 😄

One pain is that when I attach a file while the socket has disconnected, and the toast is not shown. Everything looks fine. But, the attached file is lost when I refresh the page. Is it possible to show an error when a file upload is failed?

I tried working over VPN to US. Still, I get the issue. No idea where the problem is.

Is there any network tool where I can debug all the connections (HTTP, WS, …) from my computer?

visnup commented 1 year ago

I'm going to try to dig more into this. To restate a few things to make sure I remember:

saneef commented 1 year ago

this currently happens across browsers, but mostly in Firefox?

Yes, across the browsers. The time to see the toast is shorter in Firefox. Chrome there were times I don't see the toast even when the WS connection is closed.

refreshing can fix it and sometimes successfully saves changes afterwards?

Yes, refreshing the page can fix it, temporarily.

changes are sometimes lost

For me, code change has only lost once so far. So, I'm uncertain if that is related to this issue. I'm refreshing a lot (like every 5–10 mins) because of this issue. Haven't lost any code change other than that one instance.

An attachment uploaded when spinner is active (WS down) will end up lost on the next refresh. This has happened to me many times. I can reproduce this consistently.

other connectivity seems unaffected (navigating to other pages, other sites)

Yeah, other navigations are not affected.

Thanks, @visnup, for looking into this.

visnup commented 1 year ago

An attachment uploaded when spinner is active (WS down) will end up lost on the next refresh. This has happened to me many times. I can reproduce this consistently.

Ah, I'd consider this yet another issue that could use looking into. I'm going to be chasing the root cause of why it even goes into this state for you though. If we can fix that, then the file attachment issue will become less urgent, but could still use tracking.

visnup commented 1 year ago

Yes, across the browsers. The time to see the toast is shorter in Firefox. Chrome there were times I don't see the toast even when the WS connection is closed.

@saneef Can you double check your browser versions for me?

visnup commented 1 year ago

Right now I'm working on adding some logging and diagnostics in this code path to alert us when this happens. Hoping that will shed some clue on what circumstances or what environment this is happening in.

saneef commented 1 year ago

Browser versions:

Firefox 104.0.2 (64-bit) Safari 15.6.1 Chrome 106.0.5249.21 dev

visnup commented 1 year ago

ok well, I can now see whenever you've run into this toast now (I count about 3 in the past couple of hours). at least 3 other users have experienced it too. now going to see if I can figure out what kind of debugging info I can try to collect when logging this exception.

visnup commented 1 year ago

after a day of collecting data, seen this event 29 times across 23 users. the bad news though is @saneef you're by far and away the person who experiences it the most:

image

everyone else has only hit it once. 😞 I have another PR to add even more logging which hopefully should provide more insight…

this isn't normalized by editing activity though, so could just be you see it more compared to the others because you edit more.

visnup commented 1 year ago

I'm still investigating this, trying to find some kind of commonality between each event… We have a solid chunk of data now on it across 100 users. Sorry it's slow going.

visnup commented 1 year ago

OK! We think we finally narrowed this down to an extension which was blocking usage of the Page Visibility API, which the client state machine depends upon to reawaken after changing tabs.

Now that we know that, maybe we could add some warnings if we detect that some of these APIs we're dependent on are disabled. But, closing this for now!

visnup commented 1 year ago

A subtle change, but that error toast now contains a "Learn more" link that goes to https://observablehq.com/@observablehq/how-saving-works#connectivity which mentions the page visibility API. I don't have high hopes that will help a lot of people immediately, but it may catch a few more experiencing similar issues.