mozilla / pdf.js

PDF Reader in JavaScript
https://mozilla.github.io/pdf.js/
Apache License 2.0
48.34k stars 9.97k forks source link

Incorrect Color Reproduction For Specific Color Spaces #9940

Closed brianholle closed 6 years ago

brianholle commented 6 years ago

Original PDF - Original_Test_Doc.pdf PDF Converted to Adobe RGB (1998) - Original_Test_Doc_RGB1998.pdf

Configuration:

Steps to reproduce the problem:

  1. Open https://mozilla.github.io/pdf.js/web/viewer.html
  2. Load PDF

What is the expected behavior? (add screenshot)

screen shot 2018-07-30 at 11 03 27 am

What went wrong? (add screenshot)

screen shot 2018-07-30 at 11 03 52 am

Using the Original_Test_Doc.pdf file above in the viewer produces an incorrect color. After inspecting the attached pdf file, I believe the color type for the image that is discolored is of type "Spot Color." From what I can find online, it seems that converting this color type into sRGB (The format I think PDF.JS is using) is not very straight forward, so it is possible PDF.JS does not support it.

I tried converting the original PDF into the Adobe RGB (1998) color profile using Adobe Acrobat and the results seem to be accurate to the original PDF (Converted file attached at top).

Does PDF.JS support PDF's that contain images from these specific color spaces? Is there a list of acceptable color spaces somewhere in the wiki that I may have missed? If PDF.JS does not support this color space, what would be the recommended course of action to take in order to convert the PDF into something more PDF.JS friendly other than running each document through Adobe Acrobat?

timvandermeij commented 6 years ago

I think this is what you're looking for:

https://github.com/mozilla/pdf.js/blob/master/src/core/colorspace.js#L212-L255

Those are the colorspaces that we parse/support. If the colorspace is not supported, you should see an error in the console.

brianholle commented 6 years ago

I am not seeing an error in the console using the above PDF. Does this mean it is supported? If so, what would be causing this discoloration?

timvandermeij commented 6 years ago

I looked into this PDF file a bit more and it's using the CalRGB colorspace, which is supported, though it looks like something is going wrong in the conversion steps here. It looks like this is a bug in the colorspace conversion; thank you for reporting this.

brianholle commented 6 years ago

@timvandermeij I started to debug and may have found what is causing the issue.

colorspaces.CalRGBCS() line 939 ... this.MXB = matrix[3]; For my pdf, this value is being set to a negative number (-0.00002). This causes the check at line 964:

if (this.MXA < 0 || this.MYA < 0 || this.MZA < 0 || this.MXB < 0 || this.MYB < 0 || this.MZB < 0 || this.MXC < 0 || this.MYC < 0 || this.MZC < 0)

to fail and reset the values for: this.MXA, this.MYB, this.MZC, this.MXB, this.MYA, this.MZA, this.MXC, this.MYC, and this.MZB.

I noticed that by modifying the negative matrix value to a positive value and forcing the test to pass at run time, the resulting pdf's color is MUCH closer to the correct result.

I am trying to track back to where this value is first set. The furthest i can find is in the method "loadResources" in document.js , which eventually creates the value xobj (which i believe contains Stream/Dict/xRef ... /map/matrix) in Evaluator.js. Is this value set by PDF.JS or is it generated from the contents of my pdf?

I am hoping that if I can fully understand where/how exactly this matrix variable is being set, that I will be able to determine what is generating this negative value and causing my issues.

Sorry for any confusions with the above text, this is my first time working with JS. Please ask for clarification if any of the above does not make sense.

brendandahl commented 6 years ago

Have you tried removing part that checks that each matrix value is < 0? I don't see anything in the spec saying that these numbers can't be negative.

brianholle commented 6 years ago

This seems to correct my issues. Are there any unseen consequences of removing this check (I have yet to run tests)? Also, for future reference, what specs are you referring to?

If there is no reason for this check to be in the code, would you like me to contribute my changes/generate a PR?

brendandahl commented 6 years ago

I don't see any issues with removing that check on the matrix values. It's normal to have negative values in a transformation matrix.

I'm referring to the PDF spec portion on CalRGB https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#G7.1850396

A PR would be great! If you can, please also add this file as an "eq" ref test file so we don't regress this issue. For an example on doing this, see the changes to test/test_manifest.json, test/pdfs/.gitignore in https://github.com/mozilla/pdf.js/pull/9827/files