mozilla / pdf.js

PDF Reader in JavaScript
https://mozilla.github.io/pdf.js/
Apache License 2.0
47.14k stars 9.82k forks source link

Fix reference test "pr12564" #18259

Closed timvandermeij closed 2 weeks ago

timvandermeij commented 2 weeks ago

In PR #18257 we have seen the following intermittent failure in the reference tests:

TEST-UNEXPECTED-FAIL | test failed pr12564 | in chrome | page1 round 1 | render : Error: Requesting object that isn't resolved yet img_p0_15.

It looks like this happened only once before, in PR #18167, which may or may not be where this problem started. We should try to reproduce the issue locally. One idea is to repeatedly open/close the file until it happens so we can hopefully extract some debugging information to find out what goes wrong here.

timvandermeij commented 2 weeks ago

Good news: I have managed to find a way to reproduce the issue locally on Linux. Bad news: even with dumpio set nothing is logged :(

To reproduce the issue:

  1. Replace the test manifest with the following copy that simply runs the PDF file 50 times:
[
  {"id": "pr12564", "file": "pdfs/pr12564.pdf", "md5": "24a19949a2541b960363832cf141f2f2", "rounds": 1, "type": "eq", "annotations": true},
  {"id": "pr12564", "file": "pdfs/pr12564.pdf", "md5": "24a19949a2541b960363832cf141f2f2", "rounds": 1, "type": "eq", "annotations": true},
  {"id": "pr12564", "file": "pdfs/pr12564.pdf", "md5": "24a19949a2541b960363832cf141f2f2", "rounds": 1, "type": "eq", "annotations": true},
  {"id": "pr12564", "file": "pdfs/pr12564.pdf", "md5": "24a19949a2541b960363832cf141f2f2", "rounds": 1, "type": "eq", "annotations": true},
  {"id": "pr12564", "file": "pdfs/pr12564.pdf", "md5": "24a19949a2541b960363832cf141f2f2", "rounds": 1, "type": "eq", "annotations": true},
  {"id": "pr12564", "file": "pdfs/pr12564.pdf", "md5": "24a19949a2541b960363832cf141f2f2", "rounds": 1, "type": "eq", "annotations": true},
  {"id": "pr12564", "file": "pdfs/pr12564.pdf", "md5": "24a19949a2541b960363832cf141f2f2", "rounds": 1, "type": "eq", "annotations": true},
  {"id": "pr12564", "file": "pdfs/pr12564.pdf", "md5": "24a19949a2541b960363832cf141f2f2", "rounds": 1, "type": "eq", "annotations": true},
  {"id": "pr12564", "file": "pdfs/pr12564.pdf", "md5": "24a19949a2541b960363832cf141f2f2", "rounds": 1, "type": "eq", "annotations": true},
  {"id": "pr12564", "file": "pdfs/pr12564.pdf", "md5": "24a19949a2541b960363832cf141f2f2", "rounds": 1, "type": "eq", "annotations": true},
  {"id": "pr12564", "file": "pdfs/pr12564.pdf", "md5": "24a19949a2541b960363832cf141f2f2", "rounds": 1, "type": "eq", "annotations": true},
  {"id": "pr12564", "file": "pdfs/pr12564.pdf", "md5": "24a19949a2541b960363832cf141f2f2", "rounds": 1, "type": "eq", "annotations": true},
  {"id": "pr12564", "file": "pdfs/pr12564.pdf", "md5": "24a19949a2541b960363832cf141f2f2", "rounds": 1, "type": "eq", "annotations": true},
  {"id": "pr12564", "file": "pdfs/pr12564.pdf", "md5": "24a19949a2541b960363832cf141f2f2", "rounds": 1, "type": "eq", "annotations": true},
  {"id": "pr12564", "file": "pdfs/pr12564.pdf", "md5": "24a19949a2541b960363832cf141f2f2", "rounds": 1, "type": "eq", "annotations": true},
  {"id": "pr12564", "file": "pdfs/pr12564.pdf", "md5": "24a19949a2541b960363832cf141f2f2", "rounds": 1, "type": "eq", "annotations": true},
  {"id": "pr12564", "file": "pdfs/pr12564.pdf", "md5": "24a19949a2541b960363832cf141f2f2", "rounds": 1, "type": "eq", "annotations": true},
  {"id": "pr12564", "file": "pdfs/pr12564.pdf", "md5": "24a19949a2541b960363832cf141f2f2", "rounds": 1, "type": "eq", "annotations": true},
  {"id": "pr12564", "file": "pdfs/pr12564.pdf", "md5": "24a19949a2541b960363832cf141f2f2", "rounds": 1, "type": "eq", "annotations": true},
  {"id": "pr12564", "file": "pdfs/pr12564.pdf", "md5": "24a19949a2541b960363832cf141f2f2", "rounds": 1, "type": "eq", "annotations": true},
  {"id": "pr12564", "file": "pdfs/pr12564.pdf", "md5": "24a19949a2541b960363832cf141f2f2", "rounds": 1, "type": "eq", "annotations": true},
  {"id": "pr12564", "file": "pdfs/pr12564.pdf", "md5": "24a19949a2541b960363832cf141f2f2", "rounds": 1, "type": "eq", "annotations": true},
  {"id": "pr12564", "file": "pdfs/pr12564.pdf", "md5": "24a19949a2541b960363832cf141f2f2", "rounds": 1, "type": "eq", "annotations": true},
  {"id": "pr12564", "file": "pdfs/pr12564.pdf", "md5": "24a19949a2541b960363832cf141f2f2", "rounds": 1, "type": "eq", "annotations": true},
  {"id": "pr12564", "file": "pdfs/pr12564.pdf", "md5": "24a19949a2541b960363832cf141f2f2", "rounds": 1, "type": "eq", "annotations": true},
  {"id": "pr12564", "file": "pdfs/pr12564.pdf", "md5": "24a19949a2541b960363832cf141f2f2", "rounds": 1, "type": "eq", "annotations": true},
  {"id": "pr12564", "file": "pdfs/pr12564.pdf", "md5": "24a19949a2541b960363832cf141f2f2", "rounds": 1, "type": "eq", "annotations": true},
  {"id": "pr12564", "file": "pdfs/pr12564.pdf", "md5": "24a19949a2541b960363832cf141f2f2", "rounds": 1, "type": "eq", "annotations": true},
  {"id": "pr12564", "file": "pdfs/pr12564.pdf", "md5": "24a19949a2541b960363832cf141f2f2", "rounds": 1, "type": "eq", "annotations": true},
  {"id": "pr12564", "file": "pdfs/pr12564.pdf", "md5": "24a19949a2541b960363832cf141f2f2", "rounds": 1, "type": "eq", "annotations": true},
  {"id": "pr12564", "file": "pdfs/pr12564.pdf", "md5": "24a19949a2541b960363832cf141f2f2", "rounds": 1, "type": "eq", "annotations": true},
  {"id": "pr12564", "file": "pdfs/pr12564.pdf", "md5": "24a19949a2541b960363832cf141f2f2", "rounds": 1, "type": "eq", "annotations": true},
  {"id": "pr12564", "file": "pdfs/pr12564.pdf", "md5": "24a19949a2541b960363832cf141f2f2", "rounds": 1, "type": "eq", "annotations": true},
  {"id": "pr12564", "file": "pdfs/pr12564.pdf", "md5": "24a19949a2541b960363832cf141f2f2", "rounds": 1, "type": "eq", "annotations": true},
  {"id": "pr12564", "file": "pdfs/pr12564.pdf", "md5": "24a19949a2541b960363832cf141f2f2", "rounds": 1, "type": "eq", "annotations": true},
  {"id": "pr12564", "file": "pdfs/pr12564.pdf", "md5": "24a19949a2541b960363832cf141f2f2", "rounds": 1, "type": "eq", "annotations": true},
  {"id": "pr12564", "file": "pdfs/pr12564.pdf", "md5": "24a19949a2541b960363832cf141f2f2", "rounds": 1, "type": "eq", "annotations": true},
  {"id": "pr12564", "file": "pdfs/pr12564.pdf", "md5": "24a19949a2541b960363832cf141f2f2", "rounds": 1, "type": "eq", "annotations": true},
  {"id": "pr12564", "file": "pdfs/pr12564.pdf", "md5": "24a19949a2541b960363832cf141f2f2", "rounds": 1, "type": "eq", "annotations": true},
  {"id": "pr12564", "file": "pdfs/pr12564.pdf", "md5": "24a19949a2541b960363832cf141f2f2", "rounds": 1, "type": "eq", "annotations": true},
  {"id": "pr12564", "file": "pdfs/pr12564.pdf", "md5": "24a19949a2541b960363832cf141f2f2", "rounds": 1, "type": "eq", "annotations": true},
  {"id": "pr12564", "file": "pdfs/pr12564.pdf", "md5": "24a19949a2541b960363832cf141f2f2", "rounds": 1, "type": "eq", "annotations": true},
  {"id": "pr12564", "file": "pdfs/pr12564.pdf", "md5": "24a19949a2541b960363832cf141f2f2", "rounds": 1, "type": "eq", "annotations": true},
  {"id": "pr12564", "file": "pdfs/pr12564.pdf", "md5": "24a19949a2541b960363832cf141f2f2", "rounds": 1, "type": "eq", "annotations": true},
  {"id": "pr12564", "file": "pdfs/pr12564.pdf", "md5": "24a19949a2541b960363832cf141f2f2", "rounds": 1, "type": "eq", "annotations": true},
  {"id": "pr12564", "file": "pdfs/pr12564.pdf", "md5": "24a19949a2541b960363832cf141f2f2", "rounds": 1, "type": "eq", "annotations": true},
  {"id": "pr12564", "file": "pdfs/pr12564.pdf", "md5": "24a19949a2541b960363832cf141f2f2", "rounds": 1, "type": "eq", "annotations": true},
  {"id": "pr12564", "file": "pdfs/pr12564.pdf", "md5": "24a19949a2541b960363832cf141f2f2", "rounds": 1, "type": "eq", "annotations": true},
  {"id": "pr12564", "file": "pdfs/pr12564.pdf", "md5": "24a19949a2541b960363832cf141f2f2", "rounds": 1, "type": "eq", "annotations": true}
]
  1. Set dumpio: true in the Puppeteer options.
  2. Remove or comment out this code to prevent log spam about duplicated tests: https://github.com/mozilla/pdf.js/blob/master/test/test.mjs#L722-L735
  3. Run npx gulp botbrowsertest.

So far I've seen at least one occurrence of the error in every run except for one, so it might need a few tries before you hit it, but then you'll see this:

TEST-PASS | eq test pr12564 | in firefox
TEST-UNEXPECTED-FAIL | test failed pr12564 | in chrome | page1 round 1 | render : Error: Requesting object that isn't resolved yet img_p0_1

Sadly there is no more debugging information from dumpio: true to help us, but now that we can reproduce it somewhat consistently (probably as consistent as it'll get given the extremely intermittent nature) we can probably instrument PDF.js or Puppeteer with more logging.

timvandermeij commented 2 weeks ago

I think we're getting closer now. I have checked out to commit 53dfb5a6baebe7ceff49f702366ae55963cd4f17 which is the one right before the DecompressionStream commit, and having run the steps above 5 times (so 250 runs of the document) resulted in the error not appearing at all. However, if I then check out commit 9654ad570aa3a36e6a507f3e821aacb763d94b31, which is the DecompressionStream commit, the error is triggered in the very first run already. This does seem to point at commit 9654ad570aa3a36e6a507f3e821aacb763d94b31 somehow causing this. I have checked that no exceptions are thrown by DecompressionStream itself, so we don't get into https://github.com/mozilla/pdf.js/commit/9654ad570aa3a36e6a507f3e821aacb763d94b31#diff-41c1577a4242a139eb7b15bace590444248dd2a8b16b24a80d5b8168d7098cafR182, so I guess it must be something that surrounds this code...

Hopefully the above, in combination with the local reproducer, helps to debug this further since as this point I'm not really sure how to proceed here.

calixteman commented 2 weeks ago

So the data returned by the DecompressionStream or by our own decompressor are the same. Interestingly, when adding few console.log everywhere with some array inside increased the likeliness to have a failure. Finally in just doing this stupid thing:

diff --git a/src/core/evaluator.js b/src/core/evaluator.js
index 5e31d0d92..4d03a9193 100644
--- a/src/core/evaluator.js
+++ b/src/core/evaluator.js
@@ -857,6 +857,10 @@ class PartialEvaluator {
         if (cacheGlobally) {
           this.globalImageCache.addByteSize(imageRef, imgData.dataLen);
         }
+
+        if (objId === "img_p0_15") {
+          await new Promise(r => setTimeout(r, 100));
+        }
         return this._sendImgData(objId, imgData, cacheGlobally);
       })
       .catch(reason => {

I manage to reproduce the issue with either Chrome or Firefox.

Snuffleupagus commented 2 weeks ago

Replace the test manifest with the following copy that simply runs the PDF file 50 times:

FYI: Note that there's an easy way to repeat a test any number of times, by setting "rounds": 50, (or any other value) instead of copy-pasting the test-entry itself.

Finally in just doing this stupid thing:

That diff seems to reproduce even without commit 9654ad570aa3a36e6a507f3e821aacb763d94b31, so there's probably some older intermittent issue where we're not correctly awaiting dependencies in src/display/canvas.js.

calixteman commented 2 weeks ago

Yep, my explanation is the following (I'm not sure it's correct):

This way the dependency will be added only when the draw is done but the draw requires the dependency ! The decompression stream patch probably just changed the timing and revealed this bug.

Snuffleupagus commented 2 weeks ago

The question is why it doesn't work, since this code should already ensure that we wait for the dependency to resolve: https://github.com/mozilla/pdf.js/blob/06800cd966111f7262d3334dfb68c3765bc92740/src/display/canvas.js#L969-L978

calixteman commented 2 weeks ago

Because the main thread receives the dependency when it's occupied to draw. Maybe we could just send the dependency before sending the paintImageXObject. Wdyt ?

Snuffleupagus commented 2 weeks ago

Because the main thread receives the dependency when it's occupied to draw.

Yes, but the existing code in src/display/canvas.js, as linked above, should already be able to handle that case by waiting for the dependency before drawing starts.

Maybe we could just send the dependency before sending the paintImageXObject. Wdyt ?

Sorry, but that sounds like a hack and we really need to figure out and fix the "real" error here.

calixteman commented 2 weeks ago

I don't think it's a hack. We send the draw instruction: https://github.com/mozilla/pdf.js/blob/06800cd966111f7262d3334dfb68c3765bc92740/src/core/evaluator.js#L798 and then we send the dependency few lines after: https://github.com/mozilla/pdf.js/blob/06800cd966111f7262d3334dfb68c3765bc92740/src/core/evaluator.js#L868

and in the main thread we're executing the draw instruction, but the dependency message is still waiting in worker message queue. From my pov, the dependency should be always sent before the drawing instruction in order to make sure it's here before using it.

calixteman commented 2 weeks ago

All the message queue is drained before the execution of micro/macro tasks in the main thread. So if the messages with draw instruction and dependency are in the same chunk, whatever the order is, then they will be treated before drawing, consequently the dependency is there when the paintImage wants to get it. But if they aren't in the same chunk, then drawing is executed before the dependency message has been treated.

Snuffleupagus commented 2 weeks ago

From my pov, the dependency should be always sent before the drawing instruction in order to make sure it's here before using it.

But then you block further worker-thread parsing on the image-data being completely decoded, and the point of the canvas.js code is to not have to do that. Besides, I now think that I know what the "real" culprit is and I've got a tentative patch...

Edit: Basically, this is my fault; sorry about that!

timvandermeij commented 2 weeks ago

Thank you both for helping out with this! I like how all the puzzle pieces combined helped to solve the issue really quickly, since given how intermittent this was I was initially a bit afraid that this was going to be a time-consuming one to track down.