veraPDF / veraPDF-library

Industry supported, open source PDF/A validation library
http://verapdf.org/software
GNU General Public License v3.0
270 stars 48 forks source link

Only 1 claimed conformance level is checked for PDFs with multiple conformance levels #1414

Open petervwyatt opened 6 months ago

petervwyatt commented 6 months ago

For PDF files that claim 2 (or more) conformance levels to different standards (e.g. PDF/A-4 and PDF/UA-2), veraPDF appears to only validate against one when PDF-Flavour "Auto-detect" is selected. This caught me out as I assumed it checked all conformance levels...

Ideally could veraPDF in "Auto-detect" mode be made to check all claimed conformance levels?

Otherwise could an explicit line added to the report output noting the other conformance level(s) detected but were not checked.

bdoubrov commented 6 months ago

Yes, this is correct: in case of two conformance declarations of PDF/A and PDF/UA the auto-detect mode picks up PDF/A profile. We'll check if this can be changed so that both PDF/A and PDF/UA are checked in this case.

bdoubrov commented 5 months ago

@petervwyatt What would be the expectation if the document claims conformance against incompatible substandards, such as for example PDF/A-1a and PDF/UA-2?

If the substandards are compatible, we can parse the document only once and then apply two different validation profiles. However, if the substandards are based on different base PDF specifications, this would require two different parsing logics (predefined CMaps are different, standard structure types are different and so on).

petervwyatt commented 5 months ago

@bdoubrov I'd assume you would test for both/all, since, for example, trivial PDFs (e.g. just a red box) could easily (and unnecessarily) meet multiple PDF/A conformance levels. That's not an error, just highly highly unusual...

bdoubrov commented 5 months ago

@petervwyatt I don't think PDF/A (or any other substandard) would allow identification with multiple parts. Duplicated XMP properties would conflict. But I see your point: having both PDF/A-1b and PDF/UA-1 might well be acceptable, even if the first is based on PDF 1.4 and the second on PDF 1.7.

bdoubrov commented 3 months ago

@petervwyatt

Here is the report from the prototype implementation of this feature:

htmlReport.zip

Is this what is expected?

petervwyatt commented 3 months ago

Hi @bdoubrov, looks good to me! 👍

Only minor suggestion might be to visually separate each individual validation section with a <hr/> as if there are errors then they could be quite long so seeing the section changes is important. And if you could add a "back to summary (top)" link (or similar wording) at the end of each section too, that would enhance navigation of potentially lengthy reports.

bdoubrov commented 1 month ago

The feature is implemented in the latest dev builds of CLI and Desktop installer. The autodetect option now detects all conformance claims in the document and validates them all, assuming they are based on the same PDF specification. If the conformance claims are based on different PDF specs (like, for example, PDF/A-1 and PDF/UA-2) then the preference is given PDF/A, then PDF/UA and only then WTPDF claims. Incompatibles claims are skipped with a warning.

For example: