martok / palefill

Inject Polyfills for various web technologies into pages requiring them
https://martok.github.io/palefill/
Mozilla Public License 2.0
79 stars 9 forks source link

Ghost window memory leak on Pale Moon 31.3.0.1 #50

Open cyanlillies opened 1 year ago

cyanlillies commented 1 year ago

This issue seems easiest to reproduce using GitHub. With Palefill installed and enabled, GitHub's navigation is performed mostly via JavaScript (clicking on UI buttons like "Issues" or "Pull requests" or navigating directories in the source tree of a repository provides a blue loading bar at the top of the page and navigation occurs without seemingly ever triggering a new page load in the classic sense).

This works fine, but if you create a fresh profile in Pale Moon 31.3.0.1 with only Palefill installed and load up, for example, this repository, and start clicking on buttons for a while to trigger as many of these soft navigations as possible (Try clicking on the issues tab, then an issue, then back to the issues tab, then to the code tab, and navigate the code for a short while, for example), and then close all of your GitHub tabs completely and go to about:memory and click all of the manual GC buttons (at least two times each for good measure -- GC, CC and minimize memory usage) and then create a memory report by clicking the Measure button, you'll see that every GitHub page you navigated to has now become a ghost window, consuming a chunk of memory forever.

According to various Mozilla Bugzilla posts (https://bugzilla.mozilla.org/show_bug.cgi?id=1143912 for example) these ghost windows are usually caused by misbehaving add-ons maintaining references to unattached windows long after they've been closed. As this issue occurs with only Palefill installed, and only on a site that Palefill operates on, it's possible something is wrong with the way Palefill handles the page and probably deserves some analysis, although it could be a bug in Pale Moon too.

The ghost windows persist if you continue using the browser for other sites long after closing the GitHub tabs. In fact, if you re-use the same tab you used for GitHub to navigate to another site (e.g startpage.com), sometimes all of the pages navigated to from within that tab keep becoming ghost windows, so you can quickly accumulate dozens of ghost windows for every single website you've ever visited in that tab since using GitHub. This can fairly easily result in gigabytes of memory usage.

Issue occurs with Palefill 1.21 and 1.22, I haven't tested any versions below that.

AroKol78 commented 1 year ago

it's only my 5 cents.

I also noticed a problem with eating memory, and in addition, even after loading another page on github (different from github), cyclic jumps of cpu use. The problem also occurs on github-wc-polyfill-1.2.19.[1/2/3]-unofficial.

I partially work around this by blocking one script (||github.githubassets.com/assets/vendors-node_modules_manuelpuyol_turbo_dist_turbo_es2017-esm_js-$script,domain=github.com) obviously this solution is not perfect (many features missing) - but enough to browse github without logging in.

martok commented 1 year ago

Very interesting. I've never really noticed anything getting worse with Palefill, there are lots of memory problems in the current DOM handling anyway. I think this may be related to templates since it is most extreme on Youtube (can't watch a stream with chat or chat replay open for example - each message eats a few megabytes, permanently). Pale Moon gets extremely slow at around 1G used memory that I have to restart anyway...

I see nothing in the Palefill code that even interacts with the content windows. The polyfills obviously do, but they live entirely inside the window and should be freed with them.

Edit: But I can reproduce your test, so there is that.

martok commented 1 year ago

This is weird. Github is the only site I can reproduce the leak on. Even a test rule that just injects std-CustomElements and nothing else is enough to cause it. On the other hand, other sites with soft navigation don't cause it and the customElements polyfill alone on other sites also doesn't. Could be that we hit something rare in UXP itself here?

cyanlillies commented 1 year ago

I think I've been able to [sporadically] reproduce what seems like a similar issue in Proton Mail without Palefill since reporting this issue, so it does seem very likely that it could actually be UXP related. Unfortunately, some research into ghost windows on the Pale Moon forum seems to suggest that flaws like this are pretty common and aren't likely to get solved any time soon in UXP, as Moonchild tries very hard to shift the blame onto the web instead. Maybe it could be possible to implement something within Palefill to work around the issues with the "turbo" script pointed out by AroKol78 (I think maybe blocking the original script and then writing some kind of shim to be able to fix whatever usability features are broken by blocking it would work? I don't think GitHub exactly needs soft navigation...)

This is a very wild guess, but maybe the nature of soft navigation could mean that there's something in the JS that is still holding a reference to elements -- maybe even webcomponent-related stuff -- that would've been in the page before the navigation occurred...or something.

EDIT: Yep, it's UXP alright. I managed to reproduce the issue even without Palefill and dom.webcomponents.enabled set to true (which was enough to unbreak Turbo). Welp...I guess that means it isn't getting fixed.

martok commented 1 year ago

Thank you for investigating further!

I don't think GitHub exactly needs soft navigation...

That should work for now: Turbo-Drive can be disabled by a data attribute.

I think I've been able to [sporadically] reproduce what seems like a similar issue in Proton Mail without Palefill since reporting this issue, so it does seem very likely that it could actually be UXP related.

Do they use custom elements, maybe with a self-hosted polyfill? Or is it entirely independent?

EDIT: Yep, it's UXP alright. I managed to reproduce the issue even without Palefill and dom.webcomponents.enabled set to true (which was enough to unbreak Turbo). Welp...I guess that means it isn't getting fixed.

Oh, so it's not (just) in the polyfill either. Things I can think of:

Some of that would explain why regular custom elements from Youtube also leak, even if they don't cause ghost windows.

I'd say that increases the chances, if we can come up with a simple reproducer. GC issues are hard enough to debug as it is... but being able to say "this specific thing leaks and it shouldn't" helps a lot. May even be able to find a related Mozilla report.

rofl0r commented 1 year ago

being able to say "this specific thing leaks and it shouldn't" helps a lot.

it could also be helpful to determine whether this leak happened in older palemoon versions, and if not bisect the commit in UXP that introduced it.

RamonUnch commented 1 year ago

I do have the same memory leak problem also using other UXP builds. Excuse me if this is redundant but reading through Pale Moon's forums, I came across this thread: https://forum.palemoon.org/viewtopic.php?t=24118 so according to ffomega the leak started between 28.8.4 and 28.9.0.2, he used youtube as a test site he did double check that when downgrading the leak occurred no more. However there is so much new that came with 28.9 that I am not sure if this is relevant.

martok commented 1 year ago

That should work for now: Turbo-Drive can be disabled by a data attribute.

Of course it isn't that easy, but it is possible and indeed, the leaks go away.

Current hypothesis: something about removing custom elements (with attached shadows?) from the DOM doesn't fully detach them, and then that something keeps the Window alive. I mean, there is an obvious cyclic strong reference in attachShadow from Element->ShadowRoot->Element, but this should get discarded regardless once the owning Window is destroyed. Nevermind, the leak also happens without attachShadow polyfilled, as I found out yesterday.

garoto commented 1 year ago

Just to give some feedback on the above commit: it seems to have completely fixed the CPU spikes I was observing since a couple months ago I guess(?) which made scrolling thru simple html pages very jarring. My three day-old Palemoon running process is currently only consuming Palemoon Working Set at 16:49:45: 648 MB after extensive GH and reddit usage, and when before, like mentioned, when it went above the 1GB working set, it was restart time.

So kudos and many thanks for that martok. HUGE fix.

SeaHOH commented 1 year ago

This workaround is modify from 5e1c7d8, just work, is not the preciseness:

  document.addEventListener("DOMContentLoaded", () => {
    const dummy = () => false,
    turboFrameDelegateConstructor = document.createElement("turbo-frame").constructor.delegateConstructor,
    elementIsNavigatable = Turbo.session.elementIsNavigatable,
    willFollowLinkToLocation = Turbo.session.willFollowLinkToLocation,
    shouldInterceptNavigation = turboFrameDelegateConstructor.prototype.shouldInterceptNavigation,
    turboOn = () => {
      Turbo.session.elementIsNavigatable = elementIsNavigatable;
      Turbo.session.formMode = "on";
      Turbo.session.willFollowLinkToLocation = willFollowLinkToLocation;
      turboFrameDelegateConstructor.prototype.shouldInterceptNavigation = shouldInterceptNavigation;
    },
    turboOff = () => {
      Turbo.session.elementIsNavigatable = dummy;
      Turbo.session.formMode = "off";
      Turbo.session.willFollowLinkToLocation = dummy;
      turboFrameDelegateConstructor.prototype.shouldInterceptNavigation = dummy;
    };
    document.addEventListener("visibilitychange", () => {
      if (document.hidden) turboOff(); else turboOn();
    });
    window.addEventListener("beforeunload", () => {
      turboOff();
    });
  }, {once: true});
Vangelis66 commented 1 year ago

This workaround is modify from https://github.com/martok/palefill/commit/5e1c7d86228d9f96c3e53809c38a3eeebad10886, just work,

... Thanks for your code, but how exactly is that supposed to "just work"? As I have already posted sometime ago, "Turbo" is currently BROKEN in palefill even when enabled 😞 😒 ...

garoto commented 1 year ago

If it was posted as a diff file format at the very least.

SeaHOH commented 1 year ago

If it was posted as a diff file format at the very least.

My PC stay at Windows 7, there is no diff command.

"Turbo" is currently BROKEN in palefill even when enabled

That caused by 5e1d254, I do not use palefill, and forgot that. Here is the patch:

        const element = mutation.target;
+       if (element.tagName === "TURBO-FRAME") continue;
        const CEdef = element.__CE_definition;

@martok This patch is work, means that the observer from CE polyfill has not completely stopped working. Maybe there are more issues which caused by it.

garoto commented 1 year ago

My PC stay at Windows 7, there is no diff command.

Not trying to be a jackass here but as a heads-up, there are plenty of working, XP backwards compatible GNU diff Windows ports avaiable. Unxutils and gnuwin32 at sf.net are classic examples. The Windows busybox port here on Github should also provide a working POSIX compliant diff binary at only around ~640KB for the whole package.

Vangelis66 commented 1 year ago

My PC stay[s] at Windows 7, there is no diff command.

I am on Windows Vista SP2 32-bit in my old laptop, https://gnuwin32.sourceforge.net/packages/diffutils.htm has been working fine all those years 😜 ...

Also, diff is available inside git-for-windows and native compilers like MSYS2 (all available for Win7 SP1 πŸ˜„ ) ...

Remember this (generated on Vista SP2 32-bit) ?

Thanks all the same :smile: ...

PS: @garoto beat me to it...

SeaHOH commented 1 year ago

Yes, you are right, I know here it is, but I will not use it. :)

martok commented 1 year ago

None of this has anything to do with the issue here.

martok commented 1 year ago

I've moved the hard disabling into a separate fix for the last release since it used to be lumped in with four others that are completely not needed anymore. Has anyone (ping @cyanlillies) tried repro'ing the original issue in the current version of PM? Is the leak still present? The file navigation is basically soft anyway, so we might run into the same problems?