veraPDF / veraPDF-library

Industry supported, open source PDF/A validation library
http://verapdf.org/software
GNU General Public License v3.0
270 stars 48 forks source link

PDF/A-2B difference validation result difference between veraPDF and latest version Adobe PreFlight #1397

Open rogerk-apryse opened 9 months ago

rogerk-apryse commented 9 months ago

I tried one version of PreFlight to validate conformance with PDF/A level 2A and I got the same error that VeraPDF shows.

However, this is strange because stepping into the code, I verified that the glyph we used with character code 33 has width 250 in both the PDF font dictionary and in the embedded font file. The PDF font dictionary specified character code 34 has a width of 0 but the we do not use character code 34 in the page content stream anywhere and the PDF/A rule for this applies only to characters that are used and that are rendered with a non invisible text rendering mode.

I used Acrobat Pro version 11.0.23 and it shows the font widths error. But then I used the newer Acrobat Pro DC version 2015.006.30526 and this one sure enough seems to have fixed the issue and no longer shows the font widths error.

Can you please check if there might be a fix that has to be done to the verification for this issue in veraPDF?

This is the veraPDF error:

" Rule 6.2.11.5-1 Requirement For every font embedded in a conforming file and used for rendering, the glyph width information in the font dictionary and in the embedded font program shall be consistent.

root/document[0]/pages[0](11 0 obj PDPage)/contentStream[0]/operators[5]/usedGlyphs[0](AAAAAC+TimesNewRomanPSMT AAAAAC+TimesNewRomanPSMT 33 0 0) "

Attaching the PDF file and screenshot from PreFlight 271 essai-a.pdf 48df8983-0361-4f7d-a4a0-e2b5a99d9aeb

Thanks Roger

bdoubrov commented 9 months ago

@rogerk-apryse thanks a lot for raising this issue

First, note that you probably refer to PDF/A-2b, not PDF/A-2a, as the attached PDF document is identified as PDF/A-2b.

The issue with the attached file is that the embedded font in question ((AAAAAC+TimesNewRomanPSMT) is invalid. Its Chatset entry assigns the same glyph name "space" to two different glyphs (GID=1 and GID=2). These two glyphs actually have different widths of 0 and 250. As a result, there is ambiguity in the choice of such glyph, and I guess veraPDF and Acrobat Preflgiht use different logic for glyph selection.

We'll log a warning that embedded font program is incorrect and will also try to adjust our logic for glyph selection in such ambiguous cases to match Acrobat.

rogerk-apryse commented 9 months ago

Hello and thanks for the good explanation. Roger

bdoubrov commented 8 months ago

We have added a warning on duplicated glyph names in the embedded font program in the latest dev build of veraPDF

rogerk-apryse commented 8 months ago

Thanks for the update! Roger

bdoubrov commented 3 months ago

Added to the latest veraPDF release 1.26

rogerk-apryse commented 3 months ago

Thanks!

dgt-amexio commented 1 month ago

Hello,

Could you kindly confirm in which version this fix is supposed to be applied ? From your comment shoud be on 1.26. I tested against 1.26.1 as available here but I can still reproduce the issue with the file attached to this ticket.

I also notice that a 1.26.2 release occured (from repo tag) but this release is not exposed

Kind regards

bdoubrov commented 1 month ago

The fix (i.e. the new log message on the duplicated glyph names) was added only to the Greenfield version of veraPDF and is not, unfortunately, available in the PDFBox version. I do suggest switching to the Greenfield version of veraPDF, as PDFBox one will not be supported starting from 1.30 and it already lacks some fixes done only in the Greenfield version.

dgt-amexio commented 1 month ago

Hello, Thx for your feedback.

I confirm that using the Greenfield version of VeraPDF version 1.26.1, I have a warning message in logs about the "duplicate" glyph name space.

But the PDF/A validation still fails with Vera PDF 1.26.1 on the sample document. Shall we consider that "fix" is limited to adding a warn in log but does not change validation behaviour in such cases ?

From previous posts on this thread, I would have imagined that "We'll log a warning that embedded font program is incorrect and will also try to adjust our logic for glyph selection in such ambiguous cases to match Acrobat." If behaviour is not yet changed, may I suggest to keep this issue open ?

Kind regards Denis

bdoubrov commented 1 month ago

No, indeed, we haven't changed the logic of glyph selection in case of duplicate glyph names. Strictly speaking, this case is not covered by any font, PDF or PDF/A requirements and becomes implementation dependent. Even if we do this and the validation error disappears, the preservation risk would still be here due to ambiguity of glyph selection in invalid fonts.

I'll reopen the issue anyway to be able to discuss this further.