mozilla / pdf.js

PDF Reader in JavaScript
https://mozilla.github.io/pdf.js/
Apache License 2.0
47.17k stars 9.82k forks source link

Investigate if we could use some caches based on WeakRef #18148

Open calixteman opened 1 month ago

calixteman commented 1 month ago

For the context, in Fenix, several users reported some freeze when zooming/scrolling a pdf:

So my idea is too try to release memory when needed because devices don't always have a lot of memory and because afaik Android OS is trying to smartly manage it (for example in killing some apps in background). During the rendering of a page, we could keep a normal reference on each object to avoid to accidentally remove them. When the rendering is done we could just move them (maybe after a delay we could configure through an option which could be zero with Fenix and 30s for Firefox) and keep a WeakRef (see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Memory_management#weakrefs_and_finalizationregistry) on the moved objects.

@Snuffleupagus, @timvandermeij, any positive/negative thoughts ?

Snuffleupagus commented 1 month ago

Note that we already have code that will conditionally clear various main-thread page-data caches once rendering completes, see https://github.com/mozilla/pdf.js/blob/b7b8e5ef788313451e869c2816a87f943bba35c3/src/display/api.js#L1470-L1478 and https://github.com/mozilla/pdf.js/blob/b7b8e5ef788313451e869c2816a87f943bba35c3/src/display/api.js#L2826-L2833

So one thing that you could try experimenting with is setting this.#pendingCleanup = true in GECKOVIEW-builds, once rendering completes, since that might help without having to adding a bunch more complexity (e.g. ´WeakRef`s to the code)?

This should help reclaim memory faster, using existing code; obviously at the expense of needing to re-parse pages more often.

calixteman commented 1 month ago

The main advantage of the WeakRef approach is that we'd be able to adapt to the device, I mean if the device as a low memory (even if it has a lot of memory but a lot of open apps) then a GC is triggered when potentially GC'd be less frequent if the device has enough memory. For example, myself I'm not able reproduce the bugs we've because I've a recent device with a lot of memory and in my case I'd prefer rendering performance over memory use. In general, the problem with hard-coded values is they're well-tuned for some device/usage/users... when they aren't for some others. That said in term of complexity, we should likely just have a cache object with all the caching logic inside instead of spreading some little piece everywhere, which could help us to make some experimentations.

marco-c commented 1 month ago

Trying the WeakRef approach seems most promising for a good balance between perf / memory.

That said, I can reproduce the bug, if you have any quick patch I could try it out to confirm a potential cause for the problem.

Snuffleupagus commented 1 month ago

Since https://github.com/mozilla/pdf.js/issues/18148#issuecomment-2130752448 sounds a little bit like it's written by an AI, please note that you're not allowed to write code using AI for Mozilla projects (based on a general question about AI in the Mozilla Matrix "developers" channel).