pixiebrix / pixiebrix-extension

PixieBrix browser extension
https://www.pixiebrix.com
GNU Affero General Public License v3.0
83 stars 22 forks source link

MV3 sidebar loads very slowly and shows black panel when toggling from toolbar #8168

Open twschiller opened 5 months ago

twschiller commented 5 months ago

Describe the bug

To Reproduce

Steps to reproduce the behavior:

  1. Go to a web page
  2. Toggle the MV3 sidebar

Actual behavior

Expected behavior

Screenshots/Loom

Discussion/Diagnosis

Desktop (please complete the following information):

twschiller commented 5 months ago

@fregante this seems to be correlated to number of frames or page loading (which would suggest either a messenger or memory footprint issue). See video: https://pixiebrix.slack.com/archives/C0436P48QHY/p1712871319744539?thread_ts=1712862465.698609&cid=C0436P48QHY

Any recommendations for what you'd look at? @fungairino is going to add timing for the promises involved in handling the browser action click

@fungairino you might also benchmark speed when using the Show Sidebar brick vs. the browser action to isolate where the slowdown is occuring

twschiller commented 5 months ago

The timings for the Chrome API don't look bad on my local build, even though there's a delay in any content showing:

image.png
twschiller commented 5 months ago

@fungairino some additional findings by peppering in the logPromiseDuration in some places:

So the bottleneck definitely appears to be the before the HTML/JS gets rendered

image.png

More measurements:

image

It looks like a period of time after the call to chrome.sidePanel.open but before the HTML is fetched. So the ping waiting for the panel to load times out

If I comment out the CSS/JS from the HTML file, the file loads fairly instantaneously. That would suggest the content is taking a bit of time to load/is blocked, and Chrome isn't painting until the document is ready: image

🤷 However, that seems inconsistent with the behavior I saw when clicking the reload button

fregante commented 5 months ago

Unless we have a sync loop somewhere, this looks like a browser rendering bug. The sidebar page appears when fully loaded after a black screen, I don't have any other explanation for that.

I'd suggest looking at Chrome's task manager to see if the delay matches with one of its processes taking 100% of the CPU.

All messages are logged for the messenger so if they're a problem you can quickly see that.

fungairino commented 5 months ago

One suspect is that the sidebar.js file is 3.3mb which Chrome might just be taking more time to load sometimes. I think looking into a code-splitting solution to make this less beefy would be a good idea.

fungairino commented 5 months ago

I took a quick look at our webpack config, and how we load the sidebar react app and wasn't able to make any significant changes to reduce the size or effectively split up the bundle (I got a small improvement by adding LodashModuleReplacementPlugin, but didn't get around to fully test this change). This probably requires an extended deep dive that can be prioritized if this issue becomes more prevalent.

I tried using statoscope to more effectively dig into the bundle for the sidebar, but was having issues generating the expected stats for the report to work.

twschiller commented 5 months ago

If the ping times out, the user sees the following error message

image.png
twschiller commented 5 months ago

@fungairino as part of CI we generate webpack bundle information. Did you have a chance to look there?

image.png

IIRC, there's potentially some gotchas in how tree-shaken dependencies are reported

In general, a 3.3mb parsed local file is not a big deal, so I wouldn't expect it to be the root cause. I would recommend we look for more reliable signs of file size being an issue before focusing on reducing bundle size

fungairino commented 5 months ago

That report is created locally after a build, and I took a look through it, but it's hard to tell how dependencies are being pulled into the sidebar.js entry. I was hoping a different webpack analyzer tool statoscope could tell me more information on what was causing certain dependencies to be pulled in.


In general, a 3.3mb parsed local file is not a big deal, so I wouldn't expect it to be the root cause. I would recommend we look for more reliable signs of file size being an issue before focusing on reducing bundle size

I'd generally agree, but I was driven to look into this because of your comment on the latency going away when you removed the js file:

If I comment out the CSS/JS from the HTML file, the file loads fairly instantaneously. That would suggest the content is taking a bit of time to load/is blocked, and Chrome isn't painting until the document is ready:

fregante commented 5 months ago

The size is not the problem — or rather it's not a recent, urgent problem; the file is big locally because it includes the sourcemap, but on production it's much smaller.

fungairino commented 5 months ago

That file size (3.3mb) is what I observed from the current production beta extension from the play store. But yea, I understand that it's been that way for a while.

fungairino commented 4 months ago

Related datadog error details here for the "The sidebar did not respond in time" message

https://app.datadoghq.com/logs/error-tracking/issue/3f5a3168-b964-11ee-a803-da7ad0900002?query=service%3Apixiebrix-browser-extension&fromUser=true&refresh_mode=sliding&view=spans&from_ts=1713119518836&to_ts=1713205918836&live=true

fungairino commented 4 months ago

@grahamlangford and I did a deep dive looking into the code path for opening the sidebar starting from clicking on the browser action button, until the sidebar was done mounting, but we were not able to identify anything that looked promising in terms of a likely cause of latency or a blocking call.

One thing to call out at the end of our session was that we noticed that the slow laggy behavior was especially apparent when reloading the extension while having a lot of busy tabs open (ex. 10+ tabs of fox news open). This consistently caused the entire browser (and computer) to take a significant hit in performance, spiking cpu usage. In this state, clicking on the browser action icon caused the panel to open in the grey panel loading state for 10+ seconds before the sidebar was done mounting.

Afterwards, I added tracing debug statements to try to narrow down exactly where in our code the latency was really hitting, and came up with the following information (exact location of trace debug statements here: https://github.com/pixiebrix/pixiebrix-extension/commit/c44a33d6cba58a6bcbffe775cdb37d245e0b4cba)

Tracing with "clean" browser (only a few tabs, opening sidebar from browser action after extension reloaded)

0:00.000 sidebar_tracing: openSidePanelMv3 setOptions
0:00.000 sidebar_tracing: openSidePanelMv3 chrome.sidePanel.open before
0:00.016 sidebar_tracing: openSidePanelMv3 chrome.sidePanel.open after
0:00.064 sidebar_tracing: html loading
0:00.723 sidebar_tracing: sidebar init start
0:02.371 sidebar_tracing: sidebar react App mounted
0:02.375 sidebar_tracing: sidebar init done

"Busy" browser. 10+ of tabs loading heavy websites (cnn and fox news). Directly after reloading extension

0:00.000 sidebar_tracing: openSidePanelMv3 setOptions
0:00.067 sidebar_tracing: openSidePanelMv3 chrome.sidePanel.open before
0:04.265 sidebar_tracing: openSidePanelMv3 chrome.sidePanel.open after
0:57.972 sidebar_tracing: html loading
1:22.433 sidebar_tracing: sidebar init start
1:22.434 sidebar_tracing: sidebar init - after initRuntimeLogging
1:25.208 sidebar_tracing: sidebar init - after initPerformanceMonitoring
1:25.881 sidebar_tracing: sidebar init - after initToaster
1:26.504 sidebar_tracing: sidebar react App mounted
1:26.617 sidebar_tracing: sidebar init - after ReactDOM.render
1:26.622 sidebar_tracing: sidebar init - after initCopilotMessenger
1:26.623 sidebar_tracing: sidebar init done

Notice that in the busy trace, the most significant point of latency was between the "chrome.sidePanel.open after" statement which was made after the bg script calls to the chrome api to open the sidepanel and the "html loading" statement which was made in the first script loaded in the sidebar.html (loadingScreen.js)

This indicates to me that the point of latency is potentially the internal chrome process for beginning the sidebar html rendering.

grahamlangford commented 4 months ago

Thanks @fungairino, that confirms my suspicions. If it's taking almost a minute for the loadingScreen.js to load, that definitely points to the slow loading being outside our control.

grahamlangford commented 4 months ago

Deprioritizing for now. Once we've gathered more feedback, we may readdress

fungairino commented 4 months ago

We posted for help on the google groups post here: https://groups.google.com/a/chromium.org/g/chromium-extensions/c/_dJvca1ACpA

I tried the suggestion from that post and removed the fonts call, but it did not help. Here is a recording of the issue:

https://github.com/pixiebrix/pixiebrix-extension/assets/3498063/10a42282-770e-4efc-b3ea-19f4d8eaf96a

As noted earlier by @twschiller, this latency is almost not present when you simply remove loading the sidebar.js file, so maybe there's something there we can do.

fungairino commented 4 months ago

While this is currently in icebox, we are going to keep tabs on this issue via datadog. We will monitor how often users are running into timeout errors when opening the sidebar, and add custom events for more closely tracking how long it takes for the sidebar to open.

twschiller commented 1 month ago

@fungairino any updates on the Datadog side? We haven't been seeing it during dogfooding/hearing it reported on client machines

fungairino commented 1 month ago

Checking datadog, it seems we are getting between 100 and 200 errors for "The Sidebar took too long to load. Retry your last action or reopen the Sidebar".

https://app.datadoghq.com/logs?query=service%3Apixiebrix-browser-extension%20env%3Aproduction%20The%20status%3Aerror%20%22the%20sidebar%20took%20too%20long%20to%20load%20retry%20your%20last%20action%20or%20reopen%20the%20sidebar%22%20&agg_m=count&agg_m_source=base&agg_t=count&cols=host%2Cservice&fromUser=false&messageDisplay=inline&refresh_mode=sliding&step=86400000&storage=hot&stream_sort=desc&viz=timeseries&from_ts=1718569434477&to_ts=1721161434477&live=true

Anecdotally I haven't seen this issue pop up again for myself.

grahamlangford commented 1 month ago

Checking datadog, it seems we are getting between 100 and 200 errors for "The Sidebar took too long to load. Retry your last action or reopen the Sidebar".

100-200/day is relatively low considering the number of daily users

twschiller commented 1 month ago

100-200/day is relatively low considering the number of daily users

Do we know the number of users impacted per day? E.g., does it look like a problem with 5-10 people's browsers as opposed to a widespread problem?

fungairino commented 1 month ago

About 5 users per day: https://app.datadoghq.com/logs?query=service%3Apixiebrix-browser-extension%20env%3Aproduction%20The%20status%3Aerror%20%22the%20sidebar%20took%20too%20long%20to%20load%20retry%20your%20last%20action%20or%20reopen%20the%20sidebar%22%20&agg_m=count&agg_m_source=base&agg_q=%40usr.email&agg_q_source=base&agg_t=count&cols=host%2Cservice&fromUser=false&messageDisplay=inline&refresh_mode=sliding&step=86400000&storage=hot&stream_sort=desc&top_n=10&top_o=top&viz=timeseries&x_missing=true&from_ts=1718569434477&to_ts=1721161434477&live=true