latex3 / tagging-project

Issues related to the tagging project
https://latex3.github.io/tagging-project/
LaTeX Project Public License v1.3c
39 stars 15 forks source link

XMP error in the LaTeX WTPDF examples. #375

Open ozross opened 3 months ago

ozross commented 3 months ago

There is a really subtle error in the XMP packet in all of the example PDF 2.0 example documents. You need to verify the XMP in order to find it. Do so here: https://www.pdflib.com/pdf-knowledge-base/xmp/free-xmp-validator/

The attached picture says it all.

XMP-error-pdfua-rev

That is, the pdfaid property being added via a pdfaExtension schema is used as 'rev' but declared erroneously as 'year'.

After changing 'year' to 'rev' in one of my own PDFs, (using the pdfx2 package) it then passed validation. Presumably it is also just a 1-word typo in one of the LaTeX source files.

ozross commented 3 months ago

And there are 2 other places that I think are wrong too.

Both <pdfaField:valueType>Bag claim</pdfaField:valueType> and <pdfaField:valueType>Bag declaration</pdfaField:valueType> should be just <pdfaField:valueType>Bag</pdfaField:valueType>.

u-fischer commented 3 months ago

That is, the pdfaid property being added via a pdfaExtension schema is used as 'rev' but declared erroneously as 'year'.

Right. Thanks for the report. They changed that in PDF/A-4.

And there are 2 other places that I think are wrong too.

Both <pdfaField:valueType>Bag claim</pdfaField:valueType> and <pdfaField:valueType>Bag declaration</pdfaField:valueType> should be just <pdfaField:valueType>Bag</pdfaField:valueType>.

No, I think they are ok. I got this schema from https://pdfa.org/wp-content/uploads/2023/12/PDF-Declarations.pdf (the pdf has an attachment with an example schema).

There is a resource which defines the type claim:

<pdfaType:type>claim</pdfaType:type> and Bag claim is then a Bag of such claims.

Similar there is a resource defining declaration:

<pdfaType:type>declaration</pdfaType:type> and this handle then Bag declaration.

See also https://pdfa.org/wp-content/uploads/2011/09/tn0009_xmp_extension_schemas_in_pdfa-1_2008-03-20.pdf about how new field values can be defined.

ozross commented 3 months ago

That seems an OK explanation (about Bag <whatever>), but doesn't match this result:

Screenshot 2024-07-31 at 8 37 04 pm

when I test the XMP packet from one of the examples. IfBag claim is changed to just Bag then it is Bag declaration that now is reported as failing.

So there must be more to this, irrespective of what is in the PDF-Declarations.pdf example. Besides, if you check PDF-Declarations.pdf at the XMP validator site, then you get:

Screenshot 2024-07-31 at 8 58 17 pm

so it shouldn't be any surprise that there are other errors within that document.

And if you save the file A Extension Schema.xmp to disk, and then test it for XMP validation, you get the same error:

Screenshot 2024-07-31 at 9 09 00 pm

And the TN document has no example of Bag <something> — only what is in the following image:

Screenshot 2024-07-31 at 9 22 07 pm

which may be taken to imply general validity of that usage of Bag, but I'd not trust such a presumption when there is demonstrable failure with some validator.

u-fischer commented 3 months ago

I opened an issue at https://github.com/pdf-association/pdf-issues/issues/458.

davidcarlisle commented 3 months ago

The 2.0 and 1.7 examples at #72 have been updated to fix the declaraition of pdfaid:rev and also to improve support for tagging hyphenation at line breaks.

ozross commented 3 months ago

@u-fischer , @davidcarlisle Thanks for acting on this quickly.

Ulrike, was year ever used with the XMP in any PDF/A version? I cannot find it in the original ISO standards documents for PDF/A-2 nor PDF/A-3. These do have pdfaid:amd and pdfaid:corr with the year included after a ;. There may have been later updates that I do not have.

Also., I cannot find usage of Bag <anything> in the XMP specification documents from https://github.com/adobe/XMP-Toolkit-SDK/blob/main/docs/ except when <anything> is one of the basic RDF types. Surely it would at least have to be a pre-defined type? — not something (self-referentially) declared within the same Resource ?

davidcarlisle commented 3 months ago

@ozross yes there will probably be more adjustement here and as Ulrike says she opened an issue at pdfa asking for clarification, but the initial point you raised was a clear bug so we regenerated all the files just fixing that, and will come back to the other issues.

u-fischer commented 3 months ago

Ulrike, was year ever used with the XMP in any PDF/A version?

I don't know. I don't own the ISO norms and also do not intend to spend money on it. Most of the entries we used have been collected from external sources like hyperxmp, pdfx and internet and I tried to validate them with e.g. verapdf.

Also., I cannot find usage of Bag in the XMP specification

As said I opened an issue and wait for confirmation.