parallaxinc / solo

BlocklyProp without the wires
MIT License
6 stars 6 forks source link

Downloads work, then seem to halt until closing and starting new browser session #460

Open PropGit opened 4 years ago

PropGit commented 4 years ago

Testing against v1.5.0-b6.

Downloads work fine, then eventually fail to perform (no error messages, just a stuck download). The symptoms are the "Load to RAM" dialog appears frozen and the Launcher never receives a request to perform a download. This continues for each download attempt thereafter until the browser tab is closed and a new browser session is started (no restart of Launcher is needed to achieve success again).

image

Here's a video where I'm exploring and describing this issue. https://photos.app.goo.gl/u9zMbmmtHukd85xq7

Something important to note: A browser refresh does not solve the problem... only closing the browser tab and starting a new browser tab session solves the problem (as demonstrated in the video as well).

This would seem to indicate that something causing this state is being preserved in the browser session (despite the refresh) but doesn't survive the switch to a new browser tab.

PropGit commented 4 years ago

Update: I just verified that the first step Launcher takes when receiving a download request (load-prop message) is to log the event and when this situation occurs, no "Received Propeller Application..." message appears in the log.

zfi commented 4 years ago

So far, I am unable to reproduce this error. What version of the Launcher is being used?

The browser console is showing this sequence during each Load to RAM session. I would like to see if there is a deviation to the pattern when the load fails to complete.

instrument.js:110 07:30:59.106 Loading program to RAM.
instrument.js:110 07:30:59.106 Load connection is active
instrument.js:110 07:30:59.109 Requesting compiler service
instrument.js:110 07:30:59.149 Receiving compiler service results

It is important to note that the issue appears to have started after a browser refresh. The sequence should be similar to the log below.

07:33:43.522 Launching the editor
instrument.js:110 07:33:43.535 Init international messages
instrument.js:110 07:33:43.538 Init event handlers
instrument.js:110 07:33:43.539 New Project dialog handlers initialized.
instrument.js:110 07:33:43.539 Open Project dialog handlers initialized.
instrument.js:110 07:33:43.540 Import Project dialog handlers initialized.
instrument.js:110 07:33:43.550 Finding a client
instrument.js:110 07:33:43.550 Connecting to Launcher client
instrument.js:110 07:33:43.550 In BPLauncherConnection
instrument.js:110 07:33:43.551 Init terminal communications
instrument.js:110 07:33:43.558 Current project and new project are equal?  true
instrument.js:110 07:33:43.558 Inserting project Project-1a
instrument.js:110 07:33:43.558 Clearing project initial state
instrument.js:110 07:33:43.559 Setting project state for Project-1a
instrument.js:110 07:33:43.559 Setting up the workspace in blockly core
instrument.js:110 07:33:43.559 setupWorkspace: Preparing Blockly workspace
instrument.js:110 07:33:43.559 Clearing the current Blockly root workspace
instrument.js:110 07:33:43.560 setupWorkspace: Render content for: Project-1a
instrument.js:110 07:33:43.621 Displaying project blocks
instrument.js:110 07:33:43.669 Connection is: open, URL: ws://localhost:6009/.
instrument.js:110 07:33:43.844 Websocket client/launcher found - version 1.0.4
PropGit commented 4 years ago

BP Launcher v1.0.1 on Windows. I don't think there's any changes in 1.0.2, 1.0.3, or 1.0.4 that would rectify this.

I'll retest to see if I can duplicate and capture browser logs.

PropGit commented 4 years ago

I got it to happen again.

Upon fresh session, all downloads were successful. Then I refreshed, downloads were still successful. Then I waited and eventually saw a GET https://localhost:6009/ net::ERR_TIMED_OUT message. All download attemps after that fail in the way noted by this issue.

Here's the Launcher log after the last successful download and including the time period where the browser noted a timeout error:

image

Here's the full session browser logs:  utility.js:404 08:14:49.650 Connection is: open, URL: ws://localhost:6009/. utility.js:404 08:14:49.655 Trying to connect to the BP Client. utility.js:404 08:14:49.656 establishBPConnection: entry utility.js:404 08:14:49.717 Websocket client/launcher found - version 1.0.1 utility.js:404 08:14:51.634 Loading program to RAM. utility.js:404 08:14:51.634 Load connection is active utility.js:404 08:14:51.684 Requesting compiler service utility.js:404 08:14:52.082 Receiving compiler service results utility.js:404 08:14:58.825 Loading program to RAM. utility.js:404 08:14:58.826 Load connection is active utility.js:404 08:14:58.858 Requesting compiler service utility.js:404 08:14:59.404 Receiving compiler service results utility.js:404 08:15:19.962 Failed to open client connection utility.js:404 08:15:19.963 Updating toolbar jquery.js:10099 GET https://localhost:6009/ net::ERR_TIMED_OUT utility.js:404 08:15:39.176 Loading program to RAM. utility.js:404 08:15:39.177 Load connection is active utility.js:404 08:15:39.206 Requesting compiler service utility.js:404 08:15:39.406 Receiving compiler service results utility.js:404 08:15:39.407 Sending program binary to the BlocklyProp Client jquery.js:10099 POST https://localhost:6009/load.action net::ERR_TIMED_OUT

PropGit commented 4 years ago

Note in the Launcher logs, there is no variance by more than 2 ms from the 5 second interval in which it is sending the port list message to the browser. The 5-digit leading number is SSMMM (Seconds and Milliseconds).

Also, the time since the port list message prior to the last successful download to the time of the next port list message following the successful download is 1.626 seconds.

zfi commented 4 years ago

Thanks for the details. It looks like Solo detects the problem immediately after receiving the elf payload from the compiler. The 'Failed to open client connection' message means that Solo gave up trying to connect to the Launcher via a web socket and was looking for the older BP Client. The remainder of the messages are failures because there actually is not a BP Client running. I would expect to see these.

The Launcher log seems to indicate that the connection to Solo is actually still alive. This would be consistent with the absence of web socket error messages in the Solo console logs. So, it looks like Solo lost track of the web socket connection and is now banging it's head on a BP Client connection that doesn't exist. Nice.

PropGit commented 4 years ago

That's amazing. Yes, Launcher still has a connection as the Launcher logs indicate and (surprisingly despite your diagnosis) Solo indicates from the editor message at the top that the connection still exists during this time.

PropGit commented 4 years ago

Can you change the message

Websocket client/launcher found - version 1.0.1

To read:

Websocket launcher found - version 1.0.1

Since there is no client with websocket ability?

Also, the "Trying to connect to the BP Client."

Is that really client/launcher that it means to say there?

zfi commented 4 years ago

I am missing something because I am unable to replicate the error. The console logs do not appear to be from SoloCup. On a browser refresh, the console log should contain these first few entries:

WARNING: Sentry is disabled.
utility.js:404 10:31:38.136 Launching the editor
utility.js:404 10:31:38.289 Init international messages
utility.js:404 10:31:38.293 Init event handlers
utility.js:404 10:31:38.293 New Project dialog handlers initialized.
utility.js:404 10:31:38.294 Open Project dialog handlers initialized.
utility.js:404 10:31:38.294 Import Project dialog handlers initialized.
utility.js:404 10:31:38.299 Finding a client
utility.js:404 10:31:38.299 Connecting to Launcher client

I am using version 1.0.4 of the Launcher from the installer, not the Chrome plugin. Let's figure out where we are and then I'll dig into this again.

PropGit commented 4 years ago

Oops... Looks like my copy of browser log content didn't select everything. Here's the stuff at the top, up to and including the first line that I showed before.

08:14:47.886 sentry.js:38 WARNING: Sentry is disabled. 08:14:47.886 utility.js:404 08:14:47.886 Launching the editor 08:14:48.021 utility.js:404 08:14:48.021 Init international messages 08:14:48.030 utility.js:404 08:14:48.030 Init event handlers 08:14:48.031 utility.js:404 08:14:48.031 New Project dialog handlers initialized. 08:14:48.031 utility.js:404 08:14:48.031 Open Project dialog handlers initialized. 08:14:48.032 utility.js:404 08:14:48.032 Import Project dialog handlers initialized. 08:14:48.046 utility.js:404 08:14:48.045 Finding a client 08:14:48.047 utility.js:404 08:14:48.046 Connecting to Launcher client 08:14:48.047 utility.js:404 08:14:48.047 In BPLauncherConnection 08:14:48.048 utility.js:404 08:14:48.048 Init terminal communications 08:14:48.053 utility.js:404 08:14:48.053 Current project and new project are equal? true 08:14:48.053 utility.js:404 08:14:48.053 Inserting project StopLightGame(BP) 08:14:48.053 utility.js:404 08:14:48.053 Clearing project initial state 08:14:48.054 utility.js:404 08:14:48.053 Setting project state for StopLightGame(BP) 08:14:48.054 utility.js:404 08:14:48.054 Setting up the workspace in blockly core 08:14:48.054 utility.js:404 08:14:48.054 setupWorkspace: Preparing Blockly workspace 08:14:48.055 utility.js:404 08:14:48.054 Clearing the current Blockly root workspace 08:14:48.056 utility.js:404 08:14:48.055 setupWorkspace: Render content for: StopLightGame(BP) 08:14:48.072 blockly.js:1830 [Violation] Added non-passive event listener to a scroll-blocking 'wheel' event. Consider marking event handler as 'passive' to make the page more responsive. See https://www.chromestatus.com/feature/5745543795965952 Blockly.bindEventWithChecks @ blockly.js:1830 Blockly.WorkspaceSvg.createDom @ blockly.js:1325 Blockly.createMainWorkspace @ blockly.js:1812 Blockly.inject @ blockly.js:1807 initToolbox @ editor.js:1523 setupWorkspace @ editor.js:624 insertProject @ editor.js:1887 (anonymous) @ editor.js:149 mightThrow @ jquery.js:3762 process @ jquery.js:3830 08:14:48.083 blockly.js:1830 [Violation] Added non-passive event listener to a scroll-blocking 'wheel' event. Consider marking event handler as 'passive' to make the page more responsive. See https://www.chromestatus.com/feature/5745543795965952 Blockly.bindEventWithChecks @ blockly.js:1830 Blockly.Flyout.init @ blockly.js:1706 Blockly.Toolbox.init @ blockly.js:1752 Blockly.init @ blockly.js:1816 Blockly.inject @ blockly.js:1808 initToolbox @ editor.js:1523 setupWorkspace @ editor.js:624 insertProject @ editor.js:1887 (anonymous) @ editor.js:149 mightThrow @ jquery.js:3762 process @ jquery.js:3830 08:14:48.115 blockly.js:1830 [Violation] Added non-passive event listener to a scroll-blocking 'wheel' event. Consider marking event handler as 'passive' to make the page more responsive. See https://www.chromestatus.com/feature/5745543795965952 Blockly.bindEventWithChecks @ blockly.js:1830 Blockly.Flyout.init @ blockly.js:1706 Blockly.Trashcan.init @ blockly.js:1253 Blockly.init @ blockly.js:1817 Blockly.inject @ blockly.js:1808 initToolbox @ editor.js:1523 setupWorkspace @ editor.js:624 insertProject @ editor.js:1887 (anonymous) @ editor.js:149 mightThrow @ jquery.js:3762 process @ jquery.js:3830 08:14:48.618 utility.js:404 08:14:48.618 Displaying project blocks 08:14:48.683 jquery.js:3828 [Violation] 'setTimeout' handler took 662ms 08:14:48.683 [Violation] Forced reflow while executing JavaScript took 182ms 08:14:48.723 [Violation] Forced reflow while executing JavaScript took 31ms 08:14:49.579 blockly.js:928 [Violation] 'setTimeout' handler took 740ms 08:14:49.650 utility.js:404 08:14:49.650 Connection is: open, URL: ws://localhost:6009/.

zfi commented 4 years ago

Can you tell me anything about the [Violation] messages? I do not see those in my Chrome sessions.

Also, I want to verify the procedure you are using to get to the connection failure.

This is what I am doing but my results are all positive.

PropGit commented 4 years ago

I'm using v1.0.1 from the installer (on Windows). I haven't wrapped v1.0.4 for Windows yet. I checked out the code changes between v1.0.1 and v1.0.4 and don't see anything that should affect this. Most of it was script for the Mac installer creation and internal handling that creates and delivers a port list specific to a socket id (because each browser tab could have a different "preferred port" setting.

zfi commented 4 years ago

I have added a second watchdog to the connection service that looks to see when we received the last port list from the Launcher or client. If we don't see the update within 10 seconds, Solo will close the current connection and re-establish a new connection to the Launcher or Client. This will appear in v1.5.0-RC-1.

zfi commented 4 years ago

With reference to the console log messages, I updated the message when Solo connects to the Launcher to read, "BlocklyProp Launcher v1.0.4 detected.".

The "Trying to connect to the BP Client." message is actually correct. This message is emitted when Solo is unsuccessful in establishing a web socket connection with a Launcher client. If you are running the Launcher, you should NEVER see this in the logs.

PropGit commented 4 years ago

Can you tell me anything about the [Violation] messages? I do not see those in my Chrome sessions.

They appear to be happening on line 1830 of blockly.js. Here's the link to the Chrome page describing their recommendation: https://www.chromestatus.com/feature/5745543795965952

If you're not seeing it on your side, I'm wondering if I have an extension installed that's looking at the page and causing this problem. I have a reader extension, for example, that let's me instantly remove all images and get to just the textual content of a site when I need to... useful when all the annoying eye-candy adds pop up when I'm researching stuff.

There's also one that's a setTimeout problem on line 928 of blockly.js and one on 3828 of jquery.js... both of which took > 500 ms, so Chrome is probably complaining of user experience issues.

PropGit commented 4 years ago

Also, I want to verify the procedure you are using to get to the connection failure.

  • Load a project and compile it at least, say 10 times.
  • Reload the browser - Is this a soft refresh ( CMD-R on MacOS), a hard refresh, (Shift-Cmd-R on MacOS) or clearing the cache through the browser settings?
  • Run a few more downloads to the device, say another 10?
  • Let the system sit idle for some time (seconds or minutes?)
  • Observe the jQuery error

This is what I am doing but my results are all positive.

I only downloaded a couple times before refreshing, and it was a soft refresh, then a couple more downloads and wait (only took about 50 seconds or so before error arrived, I think). I'll retry now.

PropGit commented 4 years ago

Having major trouble that seems Launcher-related... digging in.

zfi commented 4 years ago

I found more details on the [Violations] messages. There is a setting in the Console window of the Chrome dev tools, Default Levels, that will turn these messages on or off. You are correct in the assumption that these are UI interactions that are, in Google's view, taking too long. And they believe that anything that takes more than 50ms is too dang long. Must be nice.

PropGit commented 4 years ago

I'll have to look into the previous message.

I've duplicated it a few times now. Here's a short video showing exactly how.

https://photos.app.goo.gl/ZGJRkpLv3QUXMcFb9

PropGit commented 4 years ago

I have added a second watchdog to the connection service that looks to see when we received the last port list from the Launcher or client. If we don't see the update within 10 seconds, Solo will close the current connection and re-establish a new connection to the Launcher or Client. This will appear in v1.5.0-RC-1.

I so badly want to get away from this polling/heartbeat thing whenever we can think clearly enough to do it.

A full Propeller 1 program may take up to 6 or so seconds to download. The port list "service" is halted during downloads... so if Solo was sitting at 4.9 seconds since it's last received message, and there's a large download the occurs right then, it may be > 12 seconds or so (in it's eyes) before it receives another port list.

Can we make Solo extend the timeout or better yet have Solo restart its internal countdown the moment a download request is sent to Launcher?

zfi commented 4 years ago

I think we are at the place we need to be to make that change. There are at least two ways to do this. First, my preferred process is that Solo sends a ping-like message to Launcher and gets a pong-like response. Then Solo knows that we're good to go. The alternate, but more shady process is to rely on the port updates to determine that there is a valid connection.

A future update of Launcher would send a port list when requested and then only send it again when something has changed in the list or when Solo asks for the list again. This is a trust fall unless we have a ping/pong interface in the Launcher - or something equivalent.

PropGit commented 4 years ago

First, my preferred process is that Solo sends a ping-like message to Launcher and gets a pong-like response. Then Solo knows that we're good to go.

This is actually possible right now (and has been before these recent versions even) by Solo sending a "hello-browser" message and Launcher sends it back a "hello-client" response with it's version number. It's perfectly acceptable in the current messaging to send a "hello-browser" at nearly any time (but would suggest not making it more frequent than 5 seconds).

We must make sure that Solo does not send a "hello-browser" message during a download request; it can resume requests after download is complete. Similarly, "hello-browser" requests during continuous terminal or graphing activity is redundant and may interfere. Maybe the rule is, if there hasn't been any message from Launcher in some amount of time, send a "hello-browser" message just to check if it's still there; otherwise, the other message activity serves as knowledge of a healthy connection.

The alternate, but more shady process is to rely on the port updates to determine that there is a valid connection.

Shady, yes... this is what we're doing already and for some reason it's not working perfectly all the time.

A future update of Launcher would send a port list when requested and then only send it again when something has changed in the list or when Solo asks for the list again. This is a trust fall unless we have a ping/pong interface in the Launcher - or something equivalent.

Right. Currently, Launcher doesn't detect port changes by user activity and OS event messaging... that would be key to making this perfect. I can; however, make it keep track of the list it last sent and won't send a new list until there's a difference (or another request from Solo).

To make that work well with older browser apps, we should extend the "port-list-request" message to include an action field. The valid values for action can be "on-interval" or "on-demand" which means....

messages

type: "port-list-request", action: "on-interval" - Launcher sends immediate port list message in response, then periodic port list messages on a timed basis (every 5 seconds; except during download processing).

type: "port-list-request", action: "on-demand" - Launcher sends immediate port list message in response, then only sends future port list messages when a change has occurred, or upon receiving another port-list-request.

The lack of an action field is the same as having an "on-interval" action field, to stay compatible with current browser apps in use.

---OR---

Instead of action, perhaps the message should include an interval field, which contains a number.

zfi commented 4 years ago

The intent for the ping/pong message was to verify that we have an active web socket connection before sending a load command to the Launcher. We use it to set up the initial connection now. I would extend the use cases to do the connection validation before the load to RAM or EEPROM commands are issued. If we are in agreement on this point, I think I can remove the remaining connection watchdog timers in Solo. It will be magic.

On the port list proposal, sending an interval value is quite flexible and addresses all the known use cases. The JSON that Solo is sending now is this:

{
    "type": "port-list-request",
    "msg": "port-list-request"
}

The new JSON payload to request a port list with updates every 5 seconds would look like this:

{
    "type": "port-list-request",
    "action": "5"
}

For reasons I do not yet understand, the current JSON payload does not use the key 'action' but instead uses the key 'msg'.

zfi commented 4 years ago

It looks like this has morphed into three separate issues. I am pushing this to the 1.5.1 release to give us more to to sort all of this out.

PropGit commented 4 years ago

You're correct.

I've split out the port-list-request enhancement into Issue #464 now.

I've split out the ping-pong check into Issue #465 now.

PropGit commented 3 years ago

This may be related to the same cause that is addressed by v1.5.7 and v1.5.8. Issue #514.