mozilla / pdf.js

PDF Reader in JavaScript
https://mozilla.github.io/pdf.js/
Apache License 2.0
47.85k stars 9.91k forks source link

Error when I try to view a pdf ( uncaught exception: Page index 0 not found. ) #8088

Closed mus2007 closed 7 years ago

mus2007 commented 7 years ago

Link to PDF file (or attach file here):

TestPDF.pdf

Configuration:

Steps to reproduce the problem:

  1. Go to https://mozilla.github.io/pdf.js/web/viewer.html 2.open the attached document in the demo viewer
  2. I get the error uncaught exception: Page index 0 not found.
timvandermeij commented 7 years ago

I can reproduce this on Arch Linux. The exception is being thrown here: https://github.com/mozilla/pdf.js/blob/cfaa621a05cb5360f6834cf96a16c44c1a9aed5b/src/core/obj.js#L514. I'm guessing this indicates that the PDF file is corrupted, but I'm not entirely sure.

Edit: The file actually looks fine when viewed in the object browser. It may also be a regression. A builld on 5 March 2015 is already broken, so if it's a regression, it has been broken for a long time now.

Snuffleupagus commented 7 years ago

This is unfortunately (another) regression from PR #3848.

It appears that PR #5655 wasn't quite general enough to handle all possible cases of empty Kids nodes, and from a cursory look at this issue I'm not seeing an immediate way to fix it. (Well, just getting rid of src/core/obj.js#L497-L508 would do the trick. That was actually my first naive solution to issue #5644, note the obsolete comment in PR #5655, but it was was rejected during review.)

@brendandahl Any ideas how we can fix this issue, and not regress #5644 while doing so, that doesn't simply mean removing the optimization in src/core/obj.js#L497-L508?

Snuffleupagus commented 7 years ago

Having looked a this a bit more today, I'm still not able to come up with a general solution (except removing the optimization) that is guaranteed to always work regardless of where in the Pages tree an empty Kids node is encountered.

However, I'm not convinced that (potentially) parsing all Kids at a certain level (usually the bottom) of the tree is really that big of an issue in practice. In the default viewer, besides the disableAutoFetch mode, we're fetching all pages during loading anyway; see web/pdf_viewer.js#L419-L434. Since the code in the viewer already forces us to fetch all Page dictionaries, removing the optimization should really only (possibly) hurt the disableAutoFetch mode. Hence I'm thinking that having a properly working Catalog.getPageDict method ought to be much more important than keeping the number of requests in disableAutoFetch mode to an absolute minimum.

One possible downside to removing the optimization is obviously that we'd need to spend more time inside of the Catalog.getPageDict method. However, with a bit of caching we should be able to reduce the amount of redundant fetchAsync calls considerably.

So in closing, this is the only thing I'm able to come up with: https://github.com/mozilla/pdf.js/compare/master...Snuffleupagus:issue-8088. If that isn't even a remotely acceptable solution, then I'm unfortunately all out of ideas here :-(

brendandahl commented 7 years ago

However, I'm not convinced that (potentially) parsing all Kids at a certain level (usually the bottom) of the tree is really that big of an issue in practice.

We fetch all the pages, but not until one page is rendered. If we traverse all the pages at once this ends up basically forcing us to fetch all data for the pdf and removes all the benefits of chunked loading. I haven't tested this in awhile (and hopefully we haven't regressed it), but the benefits of this approach were really noticeable on slow (or throttled) internet with big pdfs.

Snuffleupagus commented 7 years ago

If we traverse all the pages at once this ends up basically forcing us to fetch all data for the pdf and removes all the benefits of chunked loading.

I'm not actually seeing this with my patch(es), since unless I'm mistaken the only difference should be when we're at the bottom of the tree. Please let me know if I'm totally off with the example below!

Looking at the tracemonkey file, please refer to the screen-shot below. Now assuming that we want to get the Page dict referred by 122 0 R, then with the current code we can just go directly from the node referred by 40 0 R to the destination. With my patches, we'll have to check the nodes before that one as well (i.e. 2 0 R, 43 0 R, 104 0 R, 110 0 R, 116 0 R) so that we don't accidentally miss an empty node. Obviously that is slower than the current code, but it's not the entire tree and normally those nodes will already have been fetched by previous getPageDict calls anyway (for smaller pageIndex values).

pages_tree

gustavomassa commented 4 years ago

Hello,

Any workarounds for this issue? I'm facing the error: Page index 1 not found.

"pdfjs": "^2.3.1",
"pdfjs-dist": "^2.2.228"

I'm receiving the error when trying to getPage 2 of a PDF that has 2 pages. The PDF file is not corrupted, the first page is rendered normally, the second one generates the error. When the file is downloaded, is opens normally on any PDF viewer and shows the 2 pages.

        pdfDoc.getPage(num).then(function (page) {
          console.log("getPage: ", page);
          var viewport = void 0;
          var pageWidthScale = void 0;
          var renderContext = void 0;

          if (pageFit) {
            viewport = page.getViewport(1);
            var clientRect = element[0].getBoundingClientRect();
            pageWidthScale = clientRect.width / viewport.width;
            if (limitHeight) {
              pageWidthScale = Math.min(pageWidthScale, clientRect.height / viewport.height);
            }
            scale = pageWidthScale;
          }
          viewport = page.getViewport(scale);

          setCanvasDimensions(canvas, viewport.width, viewport.height);

          renderContext = {
            canvasContext: ctx,
            viewport: viewport
          };

          renderTask = page.render(renderContext);
          renderTask.promise.then(function () {
            if (angular.isFunction(scope.onPageRender)) {
              scope.onPageRender();
            }
          }).catch(function (reason) {
            $log.log(reason);
          });
        }).catch(function (ex) {
       $log.log(ex);
    });
ghost commented 3 years ago

Hello there, we are currently also facing this issue. Our PDF works fine in all browsers except Firefox. Is there any workaround? We have full control over the PDF and its rendering in the UI and could modify it if needed.

//Edit: After digging though the PDF specs for. bit, we've found the issue. The type was set to "Page" and not "Pages". After adjusting this the PDF works as expected in FF - https://github.com/PatrickSachs/erlguten/commit/8254679b857951a06435a135b6c159b5b3eaf3ec