mozilla / pdf.js

PDF Reader in JavaScript
https://mozilla.github.io/pdf.js/
Apache License 2.0
47.87k stars 9.91k forks source link

Problem rendering Image PDFs #2802

Closed ashmashrest closed 9 years ago

ashmashrest commented 11 years ago

I am trying to render image PDFs but the firebug displays the error below. The PDFs are large scanned TIFF PDFs.

An error occurred while rendering the page. PDF.js v0.7.236 (build: f8e70dc) Message: JBIG2 error: number of instances > 1 is not supported Stack: error@resource://pdf.js/build/pdf.js:661 decodeSymbolDictionary@resource://pdf.js/build/pdf.js:36074 SimpleSegmentVisitor_onSymbolDictionary@resource://pdf.js/build/pdf.js:36569 processSegment@resource://pdf.js/build/pdf.js:36444 processSegments@resource://pdf.js/build/pdf.js:36449 parseJbig2Chunks@resource://pdf.js/build/pdf.js:36477 Jbig2Image_parseChunks@resource://pdf.js/build/pdf.js:36604 Jbig2Stream_ensureBuffer@resource://pdf.js/build/pdf.js:31986 DecodeStream_getBytes@resource://pdf.js/build/pdf.js:31100 PDFImage_getImageBytes@resource://pdf.js/build/pdf.js:27004 PDFImage_fillRgbaBuffer@resource://pdf.js/build/pdf.js:26947 PDFImage_getImageData@resource://pdf.js/build/pdf.js:26999 pdfjsWrapper/PartialEvaluator_getOperatorList/buildPaintImageXObject/<@resource://pdf.js/build/pdf.js:14553 pdfjsWrapper/PDFImage_buildImage/<@resource://pdf.js/build/pdf.js:26692 Promise_resolve@resource://pdf.js/build/pdf.js:1205 pdfjsWrapper/Promise_all/<@resource://pdf.js/build/pdf.js:1152 Promise_resolve@resource://pdf.js/build/pdf.js:1205 PDFImage_buildImage@resource://pdf.js/build/pdf.js:26715 buildPaintImageXObject@resource://pdf.js/build/pdf.js:14555 PartialEvaluator_getOperatorList@resource://pdf.js/build/pdf.js:14691 Page_getOperatorList@resource://pdf.js/build/pdf.js:221 wphSetupRenderPage@resource://pdf.js/build/pdf.js:33557 messageHandlerComObjOnMessage@resource://pdf.js/build/pdf.js:33376

yurydelendik commented 11 years ago

please provide links to the sample pdfs

ashmashrest commented 11 years ago

Here is a sample PDF link http://ashmashrestha.com/sites/default/files/4838.pdf

ashmashrest commented 11 years ago

4 days passed but I don't see any input on this issue. My project is stuck due to PDF.js and would like to know if this will ever be fixed or have to look into other Viewer options.

yurydelendik commented 11 years ago

Works for me on 0.7.305. Closing as resolved

timvandermeij commented 11 years ago

Doesn't work for me on 0.7.305 and Windows 7. I get a blank page with this in the console:

Warning: TODO: graphic state operator BM

which probably makes it a duplicate of many other bugs, but I'm just letting you know. Should we re-open this?

ashmashrest commented 11 years ago

This PDF was created using cvision PDF compressor which runs OCR as well and all the PDFs that are created cannot be opened in Firefox with PDF.js. This is a big problem with Firefox 19 rolling out PDF.js as default PDF viewer. I don't get error message if I am using firefox in the console but "PDF document might not be displayed correctly " above the blank page. I also have PDF.js installed in my server to make it default for my site rather than relying on the browser specific viewer and that when I ran into the above error

I am trying to render image PDFs but the firebug displays the error below. The PDFs are large scanned TIFF PDFs.

An error occurred while rendering the page. PDF.js v0.7.236 (build: f8e70dc) Message: JBIG2 error: number of instances > 1 is not supported Stack: error@resource://pdf.js/build/pdf.js:661 decodeSymbolDictionary@resource://pdf.js/build/pdf.js:36074 SimpleSegmentVisitor_onSymbolDictionary@resource://pdf.js/build/pdf.js:36569 processSegment@resource://pdf.js/build/pdf.js:36444 processSegments@resource://pdf.js/build/pdf.js:36449 parseJbig2Chunks@resource://pdf.js/build/pdf.js:36477 Jbig2Image_parseChunks@resource://pdf.js/build/pdf.js:36604 Jbig2Stream_ensureBuffer@resource://pdf.js/build/pdf.js:31986 DecodeStream_getBytes@resource://pdf.js/build/pdf.js:31100 PDFImage_getImageBytes@resource://pdf.js/build/pdf.js:27004 PDFImage_fillRgbaBuffer@resource://pdf.js/build/pdf.js:26947 PDFImage_getImageData@resource://pdf.js/build/pdf.js:26999 pdfjsWrapper/PartialEvaluator_getOperatorList/buildPaintImageXObject/<@resource://pdf.js/build/pdf.js:14553 pdfjsWrapper/PDFImage_buildImage/<@resource://pdf.js/build/pdf.js:26692 Promise_resolve@resource://pdf.js/build/pdf.js:1205 pdfjsWrapper/Promise_all/<@resource://pdf.js/build/pdf.js:1152 Promise_resolve@resource://pdf.js/build/pdf.js:1205 PDFImage_buildImage@resource://pdf.js/build/pdf.js:26715 buildPaintImageXObject@resource://pdf.js/build/pdf.js:14555 PartialEvaluator_getOperatorList@resource://pdf.js/build/pdf.js:14691 Page_getOperatorList@resource://pdf.js/build/pdf.js:221 wphSetupRenderPage@resource://pdf.js/build/pdf.js:33557 messageHandlerComObjOnMessage@resource://pdf.js/build/pdf.js:33376

yurydelendik commented 11 years ago

@timvanderme, that what I see a

Could be https://bugzilla.mozilla.org/show_bug.cgi?id=728571

@ashmashrest http://ashmashrestha.com/sites/default/files/4838.pdf works for me, do you use different file?

ashmashrest commented 11 years ago

Can you try http://ashmashrestha.com/sites/default/files/4839.pdf

yurydelendik commented 11 years ago

Now I see the exception, thanks

yurydelendik commented 11 years ago

4838.pdf info: [15:59:39.333] PDF 7f1d9794c21fbd136e3bc87b4399810 [1.4 itext-paulo-138 (itextpdf.sf.net-lowagie.com) / -](PDF.js: 0.7.305)

4839.pdf info: [16:00:13.405] PDF 95a74491dc40c44a9ce61482186aec9 [1.6 CVISION Technologies / PdfCompressor 5.0.420](PDF.js: 0.7.305)

big difference

ashmashrest commented 11 years ago

Sorry I was testing out with different PDF generators and apparently added the wrong file for review. We are using CVISION for converting the TIFFS to OCRed PDF and was really hoping that PDF.js would solve our viewer issue.

timvandermeij commented 11 years ago

Strange, both 4838 and 4839 are not working for me. Both blank and 4839 gives that huge error posted in the first comment of this issue.

ashmashrest commented 11 years ago

Latest copy of PDF.js resolves the issue for 4838 but the issue still remains for 4839.

ashmashrest commented 11 years ago

@yurydelendik - Not trying to be a pain but is there any updates on when this issue will be fixed. This issue is really putting to an halt to the project.

yurydelendik commented 11 years ago

@ashmashrest, there are not many issues similar to this one. CVISION generator is not used that much. We did not set a date when the issue will be resolved yet.

JBIG2 format is not a complex one. If you are a developer you can try fixing it yourself: error is raised here https://github.com/mozilla/pdf.js/blob/master/src/jbig2.js#L542 and this feature is described in section 6.5.8.2 Refinement/aggregate-coded symbol bitmap of http://www.jpeg.org/public/fcd14492.pdf#page=63

ashmashrest commented 11 years ago

@yurydelendik - I was looking into the code earlier and trying trace the error. One question that came to my mind is JBIG2 is a image format but the my PDFs were created from TIFFs. As for the code -

// 6.5.8.2 Refinement/aggregate-coded symbol bitmap var numberOfInstances = decodeInteger(contextCache, 'IAAI', decoder); if (numberOfInstances > 1) error('JBIG2 error: number of instances > 1 is not supported');

I am kinda new to the JBIG2 format and would like to know what it means numberOfInstances and if it is not supported by JBIG2, is there a need to use a different format other than JBIG2.

yurydelendik commented 11 years ago

I am kinda new to the JBIG2 format and would like to know what it means numberOfInstances and if it is not supported by JBIG2, is there a need to use a different format other than JBIG2.

It's described in section 6.5.8.2 Refinement/aggregate-coded symbol bitmap of http://www.jpeg.org/public/fcd14492.pdf#page=63

gigaherz commented 11 years ago

@ashmashrest The big upside of JBIG2 over plain bitmaps is that it's able to encode reusable patterns that it can place anywhere on the document, which means it can copy the same letter multiple times, and then overlay a refinement on top of the letter so that the end result is bit-perfect. It also has multiple compression modes but those are relatively minor features compared to the patterns.

There's a file format, DjVu, that competes with PDF, and it uses similar concepts to the JBIG2 encoding to store line art, pictures and text, which works really well for scanned documents. If you ever decide PDF is not for you, take a look at it.

ashmashrest commented 11 years ago

@gigaherz Thanks for the explanation David. My scenario is a bit unique since we digitize microfilms, books and many other resources and make it available for users. We had few requirements - OCR the scanned TIFF and deep compress the TIFFs and provide easy access download capability for the users. We have more than around 50k scanned (OCRed) PDF via CVISION available which does not work open on Firefox with PDF.js at all. That being said, we provide users who are usually non-tech to download the resource which if we decide to use another file format might make then confused on how to read the file where as PDFs are very common.

drogars commented 11 years ago

I would also be interested in hearing about the progress of this support ticket. We also have a number of clients that are having issues with PDFs we generate from scanned TIFFs or uploaded PDFs. We use JBig2 to make the final PDFs significantly smaller. What can we do to make this a higher priority issue?

yurydelendik commented 11 years ago

@drogars, is it the same exact issue your clients experiencing?

drogars commented 11 years ago

Yes. They are receiving the, "JBIG2 error: number of instances > 1 is not supported" error. We are using jbig2enc (project at GitHub) to take TIFF images and re-encode them for better file sizes.

fmms commented 10 years ago

The original PDF is offline. Could someone upload it again... I just wanted to test with current pdf.js.

timvandermeij commented 10 years ago

@ashmashrest Please post a new link to the original PDF file as the current link is dead.

ashmashrest commented 10 years ago

Sorry for the delay. Here is is the new link http://www.crl.edu/sites/default/files/attachments/pages/4838.pdf

marcosps commented 10 years ago

I can't reproduce this with Nightly, pdf.js development and Ubuntu 12.10.

Here, both evince and pdf.js are showing the same content.

Em 09/17/2013 02:13 PM, ashmashrest escreveu:

Sorry for the delay. Here is is the new link http://www.crl.edu/sites/default/files/attachments/pages/4838.pdf

— Reply to this email directly or view it on GitHub https://github.com/mozilla/pdf.js/issues/2802#issuecomment-24605581.

biggert commented 10 years ago

I concur with @marcosps and cannot reproduce with the latest (0.8.554) on Windows 7 x64, Chrome (latest).

timvandermeij commented 10 years ago

I still see nothing using http://www.crl.edu/sites/default/files/attachments/pages/4838.pdf with Windows 7 x64, Firefox 24.0 and PDF.js 0.8.556. How strange...

Edit: There does appear to be a text layer, but the page itself is empty.

ashmashrest commented 10 years ago

I concur with @timvandermeij . It still doesn't work for me Windows 7 x32, Firefox 24.0 ( using the default PDF viewer )

fkaelberer commented 10 years ago

Is it still an issue? Works for me. (Win64, FF27, PDF.js: 0.8.1095)

Snuffleupagus commented 10 years ago

I can still reproduce it, using: Windows 7 (64-bit), Firefox Nightly 30 (buildID: 20140303030201) with HWA on, and PDF.js 0.8.1095.

yurydelendik commented 10 years ago

Fixed by #4511

timvandermeij commented 10 years ago

Unfortunately, #4511 does not fix this for me. Still a blank page using Nightly.

fkaelberer commented 10 years ago

@timvandermeij Confirmed with FF28, HWA=on. The image appears only at large zoom levels (e.g. 400%).

Snuffleupagus commented 10 years ago

The image appears only at large zoom levels (e.g. 400%).

I can replicate this behaviour too, using: Windows 7 (64-bit), Firefox Nightly 31 (buildID: 20140326030203) with HWA on, and PDF.js 0.8.1310.

pan1490 commented 9 years ago

Any solution found for this? I am also facing same issue. Some .tiff files are not getting viewed by pdf.js. It throws following warning/error.

"Warning: Unhandled rejection: [Exception... "Component is not available" nsresult: "0x80040111 (NS_ERROR_NOT_AVAILABLE)" location: "JS frame :: http://tmc.pw.com:8080/dpg2/Viewer.js/pdf.js :: CanvasGraphics_paintJpegXObject :: line 6511" data: no] CanvasGraphics_paintJpegXObject@http://tmc.pw.com:8080/dpg2/Viewer.js/pdf.js:6511:0 CanvasGraphics_executeOperatorList@http://tmc.pw.com:8080/dpg2/Viewer.js/pdf.js:5467:10 InternalRenderTasknext@http://tmc.pw.com:8080/dpg2/Viewer.js/pdf.js:4846:29 InternalRenderTaskcontinue@http://tmc.pw.com:8080/dpg2/Viewer.js/pdf.js:4838:8 runHandlers@http://tmc.pw.com:8080/dpg2/Viewer.js/pdf.js:810:26"

THausherr commented 9 years ago

http://www.crl.edu/sites/default/files/d6/attachments/pages/4838.pdf works for me, W7 64 bit, FF 40.0.3. The image in the PDF is a JPEG image (DCTDecode).

timvandermeij commented 9 years ago

It now works for me too. I believe that this was a Firefox bug that has been fixed in the most recent version. Closing as fixed.