Open th0rgall opened 1 year ago
Yup, also seeing this on Edge and have not had any luck so far figuring out the cause! I'll leave some notes here from when I was last inspecting the service worker:
chrome.action.onClicked.hasListeners(x => { console.log(x); })
logs true
so it appears that the handler is still registered but clicking on the BAI does not invoke the handler.chrome.action.onClicked.addListener(() => { console.log('clicked!'); })
but it also didn't get called on BAI click.chrome.action.onClicked.dispatch(new Event('click'))
invokes both handlers.It really seems like a browser bug, as unlikely as that probably is. The extension certainly isn't actually frozen or crashed since it still works with the web app in the "broken" state. I'm not sure what could cause the BAI click to not invoke the registered handlers. Maybe the next step is looking at the Chromium source to see what's actually going on under the hood there?
It's been crazy difficult trying to figure this out. I've been trying to slowly debug over the course of days and keep a log but I don't have anything definitive because it's been so hard to reproduce reliably. I think it may have had something to do with the alarms being registered inside an event handler so I did some refactoring there and also fixed the notifications in Chrome/Edge.
Just released version 7.0.1 to the Chrome, Edge, and Firefox stores. I was actually waiting until I caught an extension in the frozen state so that I could see if these changes actually make a difference. Will post an update here in another week or so since it will take so long to confirm one way or the other!
5/13 - removed calls to chrome.alarms.*
5/14 - OK
5/15 - OK
5/16 - OK
5/17 - OK, re-added updateContentParser
5/18 - BAI Failure, fixed updateContentParser 404, BAI Failure, removed updateContentParser listener
5/19 - OK, re-added updateContentParser listener w/o fetches
5/20 - BAI Failure, removed call to chrome.storage.local.get in updateContentParser listener
5/21 - BAI Failure, removed call to SemanticVersion.greatest, BAI Failure, log alarm name only in service worker main listener, BAI Failure, changed listener period to 10 minutes
5/22 - BAI Failure, made listener handler non-async, BAI Failure, removed call to chrome.alarms.create, OK, added no-op top-level debug alarm and listener
5/23 - OK, moved call to chrome.alarms.create to onInstall handler, BAI Failure, reverted all debug changes and moved calls to chrome.alarms.create to top level
5/24 - BAI Failure, removed calls to chrome.alarms.create, OK, re-added updateContentParser alarm
5/25 - OK, re-added ServerApi.checkNotifications alarm, fixed notification icon error, BAI Failure, removed ServerApi.checkNotifications alarm and re-added ServerApi.getBlacklist alarm
5/26 - OK (notifications working - alarms can apparently survive an extension reload?!), added alarm synchronization w/ all alarms active, BAI Failure, removed ServerApi.getBlacklist alarm
5/27 - OK, re-added all alarms, added staggered start times
5/28 - OK
5/29 - OK, switched edge to store version (cannot repro error in chrome)
5/30 - OK
5/31 - OK
6/1 - BAI Failure, 7.0.1 pending edge store review
Wow! Thanks for all this investigation and the improvements. I'm on 7.0.1 on Chrome now, and I haven't recently had a freeze. I'll keep an eye out.
Do I understand correctly that you used your development build extension, together with the production servers, as a daily Readup driver to detect the BAI failure? Or did you find a way to reliably reproduce this issue on Edge?
I'm also wondering if initializing the main events listeners on event page/service worker only AFTER AsyncStore initialization might have a positive effect on this. I tried that here https://github.com/reallyreadit/web/compare/master...async-store-initialization#diff-9be7ec8ec76d8c4d16ac9f02cfa350ac222d428269910782be924313c4a86e23R257-R259 (You have to manually load the "large diff" of src/extension/event-page/main.ts
. Those two lines are the key difference, the rest is mostly increase indentation!).
This might be related, because relevant tab IDs etc. are possibly stored here and used for message passing (_tabs = new AsyncObjectStore
in the WebAppApi). Maybe there is some timing issue now where the BAI handler does not get access to a properly initialized tab store? I've seen these warnings in Chrome at least: https://github.com/reallyreadit/web/blob/e79ea5f46bf5b43c3047259591216c8f0b96ac98/src/extension/common/storage/AsyncStore.ts#L84 (I put in those warnings to see what the behavior would be without asynchronous initalization = the current behavior)
From the Chrome docs link you posted though, it does look like it's a bad practice to register an event listener as a dependency on another async event (which the above branch is doing). That's a point in favour of the current synchronous registration.
I'm on 7.0.1 on Chrome now, and I haven't recently had a freeze. I'll keep an eye out.
I'm finally on 7.0.1 on Edge as of June 9th. So far it's been four days without a freeze but I want to wait at least 30 days before closing this issue.
Do I understand correctly that you used your development build extension, together with the production servers, as a daily Readup driver to detect the BAI failure?
Exactly. I was never able to reproduce the issue using using the development servers in either Chrome or Edge. The only way I could reproduce the issue was to build the extension in production mode so it's pointing at the production servers and load it locally as a temporary development extension.
Maybe there is some timing issue now where the BAI handler does not get access to a properly initialized tab store?
It's possible but it seems so strange that the browser would permanently unregister the BAI handler because of that. Honestly though no matter what the actual underlying cause is it is guaranteed to be a strange one! The storage classes in general feel like a bad original design that was made even worse by the MV3 conversion. I feel like I should have just used the storage API from the beginning and not hidden it in those abstractions. Then we wouldn't have ended up with things like this:
That thinking led me to this question: Even if this issue is resolved in 7.0.1, what do you think about converting the entire extension to MV3-only, moving from callbacks to async/await everywhere, and removing the storage classes? I feel like it would make things a lot simpler. Firefox has had MV3 support since January and I just released a small extension to the Chrome and Firefox stores using MV3 in both of them. There are still some differences in the manifest format but I solved that by just merging the various manifest.*.json
files during the build: https://github.com/jacamera/toggle-leetcode-syntax-highlighting/tree/master/src
Well I got the freeze again after working perfectly for a week straight! I'm going to try moving ahead with the above-mentioned course of action when I can find the time.
Then we wouldn't have ended up with things like this
Yep, it's ugly. I think I put in both with the thought of making a refactor to MV3-only easier in the future: the future-forward code is already present.
Even if this issue is resolved in 7.0.1, what do you think about converting the entire extension to MV3-only,
I think the main reason why I didn't do this initially is that Firefox still doesn't support Chrome-style MV3 service workers:
See the mention here: https://blog.mozilla.org/addons/2021/05/27/manifest-v3-update/ and linked bug report.
However, I might be misunderstanding the complexity of moving everything to MV3. AFAIK, Readup's MV3 "service- worker.ts" script doesn't use any service-worker-specific JS APIs. Maybe, if we just include the same service worker output bundle like so in the Firefox manifest; and pretend it's a MV3 FF "background script":
"background": {
"scripts": [
"./service-worker/bundle.js"
]
},
... it would just work ? 🤔 UPDATE: OMG IT SEEMS TO WORK. Me duplicating that code and splitting the build system was entirely pointless.
In that case, it's a definite GO! So much better to not have duplicated code.
moving from callbacks to async/await everywhere
MV3 introduces Promises on top of callbacks for the WebExt APIs, and I definitely prefer Promises!
Within Promises, have a slight preference for using async/await + try/catch
over .then / .catch
chains, because I think it's more readable; but it's not very strong opinion.
, and removing the storage classes?
Yes, it's probably simpler to use the browser APIs more directly.
By just removing the classes though, we also remove the initializing code that checks for storage quota problems https://github.com/reallyreadit/web/blob/e79ea5f46bf5b43c3047259591216c8f0b96ac98/src/extension/common/storage/AsyncStore.ts#L55-L63
That async initialization code caused the uncertainty I described in my previous reply, since the storage can't be used immediately, and hence dependent Chrome WebExt listeners can't be registered immediately/synchronously.
But is storage quota checking really that important? I don't think Readup uses any significant storage at all now. What if we would just also cut that out, and not worry about having to async-initialize the WebExt stores?
Already made some progress! Should have published my branch earlier but I wasn't sure if it would work. Removed the storage classes too but haven't pushed those changes yet as I'm still combing through and testing. Definitely not worried about the storage quotas. I feel like if we hit a quota it could only be due to some other bug that would also be causing issues. https://github.com/reallyreadit/web/tree/universal-mv3-extension
Version 7.0.2 submitted to all stores as MV3! Very curious to see if any of the changes fix this issue. At the very least the extension code will now be a lot easier to work with.
PS: I tried to run npx rome format .
and it was going to change every line of every file due to the Windows crlf
line endings. It looked like I could override that behavior via Prettier but setting endOfLine
to auto
in .prettierrc
didn't seem to have any effect. https://prettier.io/docs/en/options.html#end-of-line
Welp 7.0.2 still froze up on me after a week or so. Going for process of elimination now so I disabled all the alarms in version 7.0.3.
On Chrome.
https://user-images.githubusercontent.com/11543641/236690002-a9e6c1ee-d65c-40c0-9336-39d0f9a90c8b.mp4
I've had this a few times now, but I'm not sure how to reproduce the situation. It might have something to do with having several windows open, and/or several profiles.
Workaround: switching the extension on/off (= a full reload), which I'm doing (invisibly) at the end of the video.
Update: I noticed I trimmed the wrong part out of this repro video. The actual bug is shown in the logs: clicking the BAI doesn't do anything except logging the "unable to send message" error. The video shows the workaround/refresh case after reloading the extension.