Closed pcworld closed 1 year ago
Hmm, that's QR code size is 16667x16667, it's like 8000dpi image.
@pcworld can you check if you can reduce size of the image to acceptable dpi --perhaps 300dpi. I'm sure other PDF readers and network providers (well it's only 35k in the PDF) will say thanks :)
@yurydelendik I am not the creator of this PDF. While the resolution of the image is indeed ridiculous, Evince somehow renders it in a few seconds only.
Yeah, there is probably optimization to render such images at lower resolution we are missing. I'm marking the issue with performance tag.
I have extracted the relevant page: eID_Broschuere-page16.pdf
The culprit is the image mask of size 16667x16667. We should scale down the image.
16 0 obj
<< /BitsPerComponent 1 /DecodeParms << /Columns 16667 /K -1 >> /Filter /CCITTFaxDecode /Height 16667 /ImageMask true /Subtype /Image /Type /XObject /Width 16667 /Length 38972 >>
stream
Implementation details:
To scale down the image, it is probably best to detect large images, slice the image in pieces, transforming (=scaling) all those images individually (on a canvas) and then painting all images together. An alternative approach is to manually transform the image, i.e. interpreting the pixels of the image yourself and interpolate the pixel values while scaling. The advantage of the latter is that its runtime performance is likely better for arbitrarily large images, and that the logic can be shared by our canvas and SVG backends (since this would then more be a math problem than a rendering task).
Debugging tips:
If you are going to debug this issue with a debugger, consider adding #disableWorker=true
to the URL. Otherwise you have to account for the fact that the logic of src/core
runs in a Web Worker, while the canvas logic runs on the main thread.
Hi, I would like to work on this issue. But, I have got a couple of questions regarding @Rob--W 's comment above -
To scale down the image, it is probably best to detect large images...
- What's the benchmark for large images? It would be helpful if you can specify the resolution(in
w
xh
) which may be labelled as large. An alternative approach is to manually transform the image, i.e. interpreting the pixels of the image yourself and interpolate the pixel values while scaling.- What do you mean by interpreting the pixels and interpolate the pixel values? Are you referring to Bilinear Interpolation? Would the implementation be similar to https://github.com/mozilla/pdf.js/blob/master/src/core/colorspace.js#L34-L56?
Also, if possible, please guide me with providing appropriate resources which I can study from and go about implementing the solution.
Hi @apoorv-mishra
To scale down the image, it is probably best to detect large images...
- What's the benchmark for large images? It would be helpful if you can specify the resolution(in
w
xh
) which may be labelled as large.
I don't have a specific value in mind, but I was thinking of big images whose width/height are significantly larger than the actually painted image, to the extend that the native image decoder would be unable to handle it, or that it would require an excessive amount of memory). You can start with a hard-coded value (that you find experimentally by trying to paint images of that size, and/or by looking at other parts of PDF.js where a maximum image/canvas size is enforced). If needed, we can make the logic more complex later (e.g. deciding whether an image is too large based on the actual size of the painted image in the rendered PDF). But for now, let's keep it simple and choose a reasonable hard-coded threshold.
An alternative approach is to manually transform the image, i.e. interpreting the pixels of the image yourself and interpolate the pixel values while scaling.
- What do you mean by interpreting the pixels and interpolate the pixel values? Are you referring to Bilinear Interpolation? Would the implementation be similar to https://github.com/mozilla/pdf.js/blob/master/src/core/colorspace.js#L34-L56?
By interpreting, I mean code that takes the pixel data and does something with it.
By interpolating, I mean interpolation in the mathematical sense. That is, take a (large) group of pixels, calculate one pixel value that approximates the appearance of the original set of pixels. Bilinear interpolation is one of the possible ways to do it, you need to investigate the available options and see which one results in a drawing that is the best approximation of the original image. Search for "browser image resize algorithm" in your favorite search engine to learn more about how browsers scale images
The resizeRgbImage
function that you linked indeed looks like a good start.
Tip: When you link to code online, link to a specific commit instead of a branch (like "master"), because the code in the master branch can change and the line numbers can become different). On Github you can get a specific commit by pressing the "y" key. So https://github.com/mozilla/pdf.js/blob/master/src/core/colorspace.js#L34-L56 becomes https://github.com/mozilla/pdf.js/blob/5b5781b45d234666241bf3354c0d390315c31d1a/src/core/colorspace.js#L34-L56
Also, if possible, please guide me with providing appropriate resources which I can study from and go about implementing the solution.
The code that I linked in my previous comments is a good start. At that point, the problem space has already been reduced from "an image in a PDF file" to "an image to be displayed". You don't need specialized knowledge of PDF to implement this.
If you want to go with the first method, you need to know how to work with the canvas API - https://developer.mozilla.org/en-US/docs/Web/API/Canvas_API. The PDF.js code base already has several eexamples. To implement the alternative approach, you need to know what the bytes of the image data mean. I don't know this from the top of my head, but if you step through the code with a debugger you can probably see some useful information.
@calixteman Will this issue also be fixed by PR #16077?
The link above seems to be broken, but the document is available (as issue8076.pdf
) in the PDF archive I shared a while back.
Yep it's one of the files I used (and the pdf can be found in https://github.com/mozilla/pdf.js/issues/8076#issuecomment-314078517). I should add it in the test suite since it takes a specific path (it's a mask): https://github.com/mozilla/pdf.js/blob/d7e4be9cdbe37d4d4d9ada34208820470bcd14ed/src/core/image.js#L374
Link to PDF file: https://www.personalausweisportal.de/SharedDocs/Downloads/DE/Flyer-und-Broschueren/eID_Broschuere.pdf?__blob=publicationFile&v=1 (sha512sum 62e4a6a96257f219d1f1fc1c644eea86b17c056486a91a159e4cc0085347b3bc3c3d1a4e06eae204e1dce92f507afa5d55c21604584ffa852efb7fb303f9f846)
Configuration:
Steps to reproduce the problem:
What is the expected behavior? The last page renders like in the following screenshot of the page in Evince:
What went wrong? In Firefox: Page doesn't finish loading (loading animation doesn't disappear), page is only partially loaded: In Chromium: Loading animation disappears, but the QR code is blank (white):
The PDF appears to have been created by "Adobe InDesign CS5.5 (7.5.3)".