veraPDF / veraPDF-library

Industry supported, open source PDF/A validation library
http://verapdf.org/software
GNU General Public License v3.0
270 stars 48 forks source link

Metadata ISO 32000-1:2008, 14.3.2 key not found #1398

Closed alfieroddan closed 8 months ago

alfieroddan commented 8 months ago

Hi there, I hope you are well! Thank you for your hard work on this tool. I am looking for some help on why I'm failing test cases.

Whilst running your checker on the following PDF - test_057_taggedparas.pdf.

We get the following test failure :

Rule
Status
[Specification: ISO 14289-1:2014, Clause: 7.1, Test number: 8](https://github.com/veraPDF/veraPDF-validation-profiles/wiki/PDFUA-Part-1-rules#rule-71-8)

The Catalog dictionary of a conforming file shall contain the Metadata key whose value is a metadata stream as defined in ISO 32000-1:2008, 14.3.2.
Failed
1 occurrences

However when looking a bit more in depth into our PDF we can see the following:

<<
/Lang (en-GB) /MarkInfo 23 0 R /Metadata 24 0 R /PageMode /UseNone /Pages 18 0 R /StructTreeRoot 22 0 R 
  /Type /Catalog /ViewerPreferences 21 0 R
>>
endobj
17 0 obj
<<
/Author (\(unauthored\)) /CreationDate (D:20000101000000+00'00') /Creator (\(unspecified\)) /Keywords () /ModDate (D:20000101000000+00'00') /Producer (\(ReportLab Internal,20100106154638\) RML2PDF http://www.reportlab.com) 
  /Subject (\(unspecified\)) /Title (Tagged) /Trapped /False
>>

...

>> /Type /StructTreeRoot
>>
endobj
23 0 obj
<<
/Marked true /Suspects false /UserProperties false
>>
endobj
24 0 obj
<<
/Length 6807 /SubType /XML /Type /Metadata
>>
stream

It seems as though we do include the Metadata key required for ISO 14289-1:2014, Clause: 7.1. Can you explain why this error has been thrown for the test case?

Thank you very much in advance for having a look.

bdoubrov commented 8 months ago

Thanks for reporting this issue. However, your attached PDF seems to be corrupt. Would you please reupload the file?

Based on the extracts of PDF internals in your message, it looks like the issue is related to the incorrect lattercase of /SubType key in this object:

24 0 obj
<<
/Length 6807 /SubType /XML /Type /Metadata
>>
stream

The correct key is /Subtype. As a result, the stream is not recognized as a metadata stream.

alfieroddan commented 8 months ago

Apologies a non-corrupt PDF has been uploaded. test_057_taggedparas.pdf

alfieroddan commented 8 months ago

After changing from/SubType to /Subtype there is no metadata stream test failure.

Thank you for your time on this!