mozilla / pdf.js

PDF Reader in JavaScript
https://mozilla.github.io/pdf.js/
Apache License 2.0
47.06k stars 9.81k forks source link

Opening 24 MB PDF file with graphics takes very long and 100 % CPU usage #18307

Open paulmenzel opened 1 week ago

paulmenzel commented 1 week ago

Attach (recommended) or Link to PDF file here:

425429.pdf from https://www.osce.org/files/f/documents/f/5/425429.pdf, 24 MB, MD5: 63f989b7058bb5bc2a48319ab3ca9139

Configuration:

Steps to reproduce the problem:

  1. Open the PDF
  2. Notice rendering takes several minutes and usage of one CPU thread is 100 %

What is the expected behavior? (add screenshot)

It should work

What went wrong? (add screenshot)

It didn’t render instantly.

PS: Evince 46.3 also has trouble rendering this.

calixteman commented 1 week ago

How do you want to render instantly a file containing 51 jpeg images 9933x7016 !!!? I've a powerful workstation with a lot of RAM and it took 80s in Chrome (vs 180 in Firefox) and I gave up to try to render it with Acrobat. I understand that from your point of view it's just a single basic page at the end but internally this pdf is awful. I'm not sure we've that much room for improvements here.

paulmenzel commented 1 week ago

Thank you for the analysis.

How do you want to render instantly a file containing 51 jpeg images 9933x7016 !!!?

;-)

I've a powerful workstation with a lot of RAM and it took 80s in Chrome (vs 180 in Firefox) and I gave up to try to render it with Acrobat.

Should this be reported to Firefox to be at least on par with Chrome?

I understand that from your point of view it's just a single basic page at the end but internally this pdf is awful. I'm not sure we've that much room for improvements here.

Understood. Feel free to close.

I wonder what machine the creator of the document used five years ago. The document properties say:

Adobe Photoshop for Windows -- Image Conversion Plug-in Adobe Photoshop CC (Windows) Mi 19 Dez 2018 14:28:24 +02:00 Do 11 Jul 2019 10:29:41 +03:00

I guess I need to switch to Microsoft Windows.

calixteman commented 1 week ago

I'm on Windows 11 and it's slow. Maybe it was fine in Photoshop and then they created the pdf without checking that the result is "correct" (you know it's like those kind of one-line patches you don't test because it's so obvious... and at the end you've a bug). That said, if you really need to share such a pdf, you should flatten it in printing it into pdf ! I've some plans to use a builtin jpeg decoder which should seriously improve the situation here but as said 51 9933x7016 images is something...

paulmenzel commented 1 week ago

I'm on Windows 11 and it's slow.

I guess you need Microsoft Windows 10 or 7. :P

Thank you for the hint with the printing to PDF.

I've some plans to use a builtin jpeg decoder which should seriously improve the situation here but as said 51 9933x7016 images is something...

Any idea, which decoder is used right now on GNU/Linux (GNOME Shell with X.Org here)?

As written, feel free to close.

timvandermeij commented 1 week ago

If it helps, we have compiled a list of optimization tips for PDF files at https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#what-types-of-pdf-files-are-slow-in-pdfjs-can-i-optimize-a-pdf-file-to-make-pdfjs-faster that could make a significant difference if you're in control of the PDF file.

MrSuddenJoy commented 1 day ago

@paulmenzel 256GB RAM, 4Gbps connection, downloading + rendering this file took 0.02s.

paulmenzel commented 1 day ago

@MrSuddenJoy, thank you for the feedback. I’d have assumed it’s limited by the CPU resources. Can you share your CPU model and environment and version too, please, so I can reproduce?

MrSuddenJoy commented 21 hours ago

I’d have assumed it’s limited by the CPU resources.

@paulmenzel this is the most probable cause. But question remains: why someone would limit CPU resources per proccess? The only, reasonable for me, answer, is that pdf.js is hosted on limited-resource machine (like shared-hosting/VPS and alike)....

your CPU model

Sure thing :) Zrzut ekranu 2024-06-28 o 12 10 38

paulmenzel commented 21 hours ago

The only, reasonable for me, answer, is that pdf.js is hosted on limited-resource machine (like shared-hosting/VPS and alike)....

pdf.js is part of the Firefox browser, so runs on desktop with all kinds of configurations.

MrSuddenJoy commented 20 hours ago

@paulmenzel

calixteman commented 14 hours ago

@paulmenzel 256GB RAM, 4Gbps connection, downloading + rendering this file took 0.02s.

I don't really know how it's possible... I've myself a desktop machine (Windows 11) with 64Gb RAM and 32 cores (3.5gHz). In Firefox it takes 1:50m to render, 40s in Chrome and 12s in Acrobat. The main bottleneck in pdf.js is having to decode the images with a pure js decoder. @MrSuddenJoy I don't know what you're measuring here, I don't know if the final rendering is correct, but I'm a bit doubtful about your 0.02s ... maybe more realistically it's 0.02h...