Open mmatela opened 6 months ago
The image is slow because it uses JPEG2000 and JBIG2 compression. If you recompress to JPEG you will see a dramatic improvement in loading speed.
@forensmatt Can I please ask you, how you identified that the PDF file contains images in that file format? I think I sometimes have this issue as well and currently trying to understand how to mitigate this problem, or at least instruct users.
I use the preflight tools in Acrobat Pro (2020). I'm pretty sure you can get similar info through Foxit Editor, which has a free trial https://www.foxit.com/pdf-editor/ .
Thank you, @forensmatt.
I have also found that with this tool https://www.metadata2go.com/view-metadata you can get an understanding of images encoding inside the PDF as well.
If you upload the file (obviously, not a confidential one), and on the results page you scroll down to the pdf_images
section - there you can check the values for the encoding
attribute.
If it's jpx
or jbig2
- that PDF file may benefit from optimizations.
Ideally, the value should be jpeg
with interpolation=false
I guess.
Thank you @slaFFik for the link to that tool. Very helpful. I am looking forward to see how much the integration of the OpenJPEG decoder will improve things and reduce or even eliminate the need for optimization , as JBIG2 and JPEG2000 are far more efficient than jpg (see the discussion here https://github.com/mozilla/pdf.js/pull/17946). Not sure if these changes have been implemented in the pdf.js web demo (it says version 4.2.53, while latest release is 4.1.392 and does appear to not have these changes ), but a quick test appears to show significant speed improvements over the older implementation of PDF.js used in Zotero 7 Beta.
I've tried the 4.2.67 release and it still takes very long to render my example PDF :/
@mmatela, perhaps your file has a mask layer. I believe I saw comments by the devs that this update did not address the slowness of masks.
Attach (recommended) or Link to PDF file here: https://sbc.org.pl/Content/403918/PDF/539_2671.pdf
Configuration:
Steps to reproduce the problem:
What is the expected behavior? The document should be displayed in a timely manner.
What went wrong? It takes around 20 seconds on my high-end laptop to display the document. It feels to be around 50 times slower than Chrome's built-in PDF browser. Changing zoom also takes a lot of time.
Link to a viewer (if hosted on a site other than mozilla.github.io/pdf.js or as Firefox/Chrome extension): https://sbc.org.pl/formats/pdf/web/viewer.html?file=https%3A%2F%2Fsbc.org.pl%2FContent%2F403918%2FPDF%2F539_2671.pdf%3Fhandler%3Dpdf#locale=pl