Closed timvandermeij closed 8 years ago
The CPU profiler shows that most of the time is spent in PartialEvaluator_hasBlendModes
:
https://github.com/mozilla/pdf.js/blob/a0aa781c394a8e5dedbf14c5db0195ab72e8ff46/src/core/evaluator.js#L144-L200
To profile (using Chrome), simply:
Thank you for profiling this! This code seems to loop over (potentially) all ExtGState and XObject dictionaries, and there are a lot of them in this PDF file. I'm afraid the in
operator might cause delays here.
I have narrowed down the flaw a bit. For the first page, https://github.com/mozilla/pdf.js/blob/a0aa781c394a8e5dedbf14c5db0195ab72e8ff46/src/core/evaluator.js#L182 is triggered almost 31000 times, meaning that we attempt to process a lot of XObjects that we already processed before.
This seems to do the trick, but I've not had time to run tests yet: https://github.com/mozilla/pdf.js/compare/master...Snuffleupagus:issue-6961. Also, I'm not sure what kind of test we can add for this, will look into this more later tonight.
Funny, I also tried using getKeys()
in a local test, but with no result, possibly because I missed https://github.com/mozilla/pdf.js/compare/master...Snuffleupagus:issue-6961#diff-0b94c2e77a5259f7a728122fdbf9f46aR182. Thanks for looking into this!
So, my patch seems to pass all tests locally, but there're two problems:
xObject.dict.objId
always seem to equal xObjects.getRaw(key).toString()
in this PDF file.The only test I can imagine is a unit test where we assert that the number of processed XObjects is less than it is currently. It's not great, but it's the only kind of test I can come up with since measuring the runtime is not an option. Otherwise I think it suffices to review the patch, test it manually and make sure that the test suite passes.
Regarding the second point, I would have to look into this more. If they are equal then the current code should do, so I'm not yet sure what the difference is with your patch.
First of all, how do we test this? Given that the run time is hardware/software dependent, I'm not sure how we can assert that a test doesn't run for too long!?
27 seconds for such a simple PDF is excessive. We could create a suite of PDFs whose rendering time is measured (can be as simple as subtracting two time stamps), and then report the results to some central place. If we occasionally look at the results (e.g. a table, or a fancy graph), then we should get a good picture of what rendering times are normal and detect performance regressions.
Second of all, I don't understand why the patch actually works ;-)
What exactly is unclear? Replacing getAll with getKeys seems like an obvious boost, is there something else with your patch that magically improves the runtime?
What exactly is unclear? Replacing getAll with getKeys seems like an obvious boost, is there something else with your patch that magically improves the runtime?
The getKeys
change is not noticeable in the grand scheme of things, the real improvement comes from https://github.com/mozilla/pdf.js/compare/master...Snuffleupagus:issue-6961#diff-0b94c2e77a5259f7a728122fdbf9f46aR186.
In practice that check seem, for all intents and purposes, to be equal to an already existing check just below (as I commented above):
https://github.com/Snuffleupagus/pdf.js/blob/21a19c1a1cdeb1bb9ddae8a44e7da00c241c899e/src/core/evaluator.js#L193-L197 ought to be enough, since as far as I can tell xObject.dict.objId always seem to equal xObjects.getRaw(key).toString() in this PDF file.
Using getAll
results in traversing the whole tree and fetching Refs.
Using getKeys
merely requires looking up all keys. But I think that if you call fetch
on every Ref value, then you're getting a similar runtime as when you're using .getAll
. By skipping .fetch
if the Ref is already known, you're saving the overhead of resolving Refs.
(this is my guess, I didn't check whether it is really the reason).
The reason above somewhat right. We had a thought to disable getAll(), not for performance but because it can pull recursively not-needed data into operator list. We probably need to review all getAll usages and remove it.
@Rob--W You are absolutely correct, I don't know how I missed that myself. Thank you!
I have created a simple PDF file using Scribus 1.5.0svn. Notice that the file size is large for a two-page PDF file, so I can only suspect that Scribus is doing something really unefficient when exporting the PDF file. Nevertheless, the PDF file below renders instantly with both Adobe Acrobat Reader DC and Foxit Reader (within 0.5 seconds), however PDF.js takes 27 seconds to render only the first page of this file. I have no idea why PDF.js is taking such an excessive amount of time, but we need to do better here, given that other viewers do not have any problems with this file. Are there perhaps inefficient patterns in this file that the optimizer could remove? I notice that the PDF file contains an excessive amount of resources in the Resources dictionary of each page (XObject, Font, Pattern and ExtGState).
Below is the PDF file. I made this myself, so anyone is free to use this as a test case in a PR that addresses this issue: test.pdf