mozilla / pdf.js

PDF Reader in JavaScript
https://mozilla.github.io/pdf.js/
Apache License 2.0
47.06k stars 9.81k forks source link

PDF/A-2B Checkboxes status wrongly rendered #18175

Open miri0001 opened 1 month ago

miri0001 commented 1 month ago

Attach (recommended) or Link to PDF file here: checkboxes-issues.pdf

Configuration:

Steps to reproduce the problem:

  1. Open the attached PDF in the https://mozilla.github.io/pdf.js/web/viewer.html Viewer OR in Mozilla Firefox Browser directly ('open with' option)
  2. All checkboxes are displayed as 'checked': image
  3. The correct rendering of this PDF should NOT have all checkboxes checked, only 1 checkbox / group (see expected behavior below)

This PDF underwent several conversions which "explain" the issue.

What is the expected behavior? (add screenshot)

  1. Opening the PDF directly in any other non-Firefox browser (Chrome, Safari, Edge) or Adobe Acrobat, the checkboxes status is correctly set:

image

What went wrong? (add screenshot)

PDF.js: image

Mozilla Firefox 115.11.0esr (64-bit): image

miri0001 commented 2 weeks ago

Here another sample of the issue. RAW_fmh-directives-anticipees-courte-fr.pdf

The attached PDF document is synchronizing the state of one checkbox (in the main content part) with the state of another checkbox (in the final summary on last page). This works on all PDF readers I could test. However, this fails on PDF.JS latest version 4.4.28 on https://mozilla.github.io/pdf.js/web/viewer.html

Acrobat Reader: image

PDF.js 4.4.28 image

calixteman commented 2 weeks ago

This pdf is a bit strange. The first 3 checkboxes have a common parent which have its value set to Ja (I guess yes in german): image Since they have the same parent they've the same value (which is set in the V entry on the parent). In either Acrobat or Chrome, clicking on the first checkbox, shows the second in the same state as the first one. The third is not checked because the Ja appearance is almost empty, but if click on it you'll switch to the Drittes appearance (which is a mark) but since the others doesn't have this Drittes value then they're shown as empty ! So, the only bug I see here is that we show a check mark when the 3rd is in the Ja state, but I think the pdf isn't really working correctly because it's just bad designed.

miri0001 commented 2 weeks ago

@calixteman Thank you very much for having looked at it. The author of the PDF published a brand new version of it as described in my comment from yesterday which is https://github.com/user-attachments/files/15793562/RAW_fmh-directives-anticipees-courte-fr.pdf . This new form version is displayed 'strangely' on PDF.js but is displayed fine on all other pdf readers I could test.

On that specific PDF, one checkbox value from the main content seems to automatically be synchronized with the checkbox state of final section at the end of the PDF.

May I ask if you see that behavior as a consequence of a wrongly formatted PDF form or if something could be improved in PDF.js ? Thank you!