mozilla / pdf.js

PDF Reader in JavaScript
https://mozilla.github.io/pdf.js/
Apache License 2.0
47.17k stars 9.82k forks source link

Very slow rendering of two-layered high-resolution PDF #17427

Open mmatela opened 6 months ago

mmatela commented 6 months ago

Attach (recommended) or Link to PDF file here: https://sbc.org.pl/Content/403918/PDF/539_2671.pdf

Configuration:

Steps to reproduce the problem:

  1. Just open the file

What is the expected behavior? The document should be displayed in a timely manner.

What went wrong? It takes around 20 seconds on my high-end laptop to display the document. It feels to be around 50 times slower than Chrome's built-in PDF browser. Changing zoom also takes a lot of time.

Link to a viewer (if hosted on a site other than mozilla.github.io/pdf.js or as Firefox/Chrome extension): https://sbc.org.pl/formats/pdf/web/viewer.html?file=https%3A%2F%2Fsbc.org.pl%2FContent%2F403918%2FPDF%2F539_2671.pdf%3Fhandler%3Dpdf#locale=pl

forensmatt commented 5 months ago

The image is slow because it uses JPEG2000 and JBIG2 compression. If you recompress to JPEG you will see a dramatic improvement in loading speed.

slaFFik commented 2 months ago

@forensmatt Can I please ask you, how you identified that the PDF file contains images in that file format? I think I sometimes have this issue as well and currently trying to understand how to mitigate this problem, or at least instruct users.

forensmatt commented 2 months ago

I use the preflight tools in Acrobat Pro (2020). I'm pretty sure you can get similar info through Foxit Editor, which has a free trial https://www.foxit.com/pdf-editor/ .

slaFFik commented 2 months ago

Thank you, @forensmatt.

I have also found that with this tool https://www.metadata2go.com/view-metadata you can get an understanding of images encoding inside the PDF as well.

If you upload the file (obviously, not a confidential one), and on the results page you scroll down to the pdf_images section - there you can check the values for the encoding attribute.

If it's jpx or jbig2 - that PDF file may benefit from optimizations. Ideally, the value should be jpeg with interpolation=false I guess.

forensmatt commented 2 months ago

Thank you @slaFFik for the link to that tool. Very helpful. I am looking forward to see how much the integration of the OpenJPEG decoder will improve things and reduce or even eliminate the need for optimization , as JBIG2 and JPEG2000 are far more efficient than jpg (see the discussion here https://github.com/mozilla/pdf.js/pull/17946). Not sure if these changes have been implemented in the pdf.js web demo (it says version 4.2.53, while latest release is 4.1.392 and does appear to not have these changes ), but a quick test appears to show significant speed improvements over the older implementation of PDF.js used in Zotero 7 Beta.

mmatela commented 2 months ago

I've tried the 4.2.67 release and it still takes very long to render my example PDF :/

forensmatt commented 2 months ago

@mmatela, perhaps your file has a mask layer. I believe I saw comments by the devs that this update did not address the slowness of masks.