johnwhitington / camlpdf

OCaml library for reading, writing and modifying PDF files
Other
200 stars 28 forks source link

Exporting and then importing bookmarks breaks jhove validation #43

Closed averms closed 1 year ago

averms commented 4 years ago

Description

Exporting and then importing bookmarks (without changing anything) breaks jhove validation.

Steps to reproduce

  1. cpdf mwe.pdf -list-bookmarks >b.txt
  2. cpdf mwe.pdf -add-bookmarks b.txt -o fin.pdf
  3. jhove -m pdf-hul fin.pdf

The output is

Jhove (Rel. 1.24.1, 2020-03-16)
 Date: 2020-05-03 00:59:02 EDT
 RepresentationInformation: fin.pdf
  ReportingModule: PDF-hul, Rel. 1.12.2 (2019-12-10)
  LastModified: 2020-05-03 00:51:22 EDT
  Size: 111344
  Format: PDF
  Status: Not well-formed
  SignatureMatches:
   PDF-hul
  ErrorMessage: Invalid object number in cross-reference stream
   ID: PDF-HUL-80
   Offset: 111313
  MIMEtype: application/pdf

The original file, mwe.pdf, validates fine. It was created with PDFLaTeX and hyperref.

johnwhitington commented 4 years ago

I tried ghostscript on it, which reported no errors. It's normally very good at Xref error reporting.

However, I ran fin.pdf through https://www.pdf-online.com/osa/validate.aspx which showed:

Validating file "fin.pdf" for conformance level pdf1.5
The key Count is required but missing.
The value of the key Count is 0 but must be 3.

So I will fix that first, and then send you a new executable. You can then run it through your jhove tool and see what happens.

I looked into jhove and it seems old and half-abandoned, so I'm not inclined to trust its exact error message -- especially when other tools would spot an "invalid object number", but do not report one -- and if cpdf were generating invalid object numbers, it would have doubtless been reported before.

Please let me know what platform you're using.

averms commented 4 years ago

I'm not too tied to Jhove. Was just using that because I didn't know anything better existed. I might start using that pdf-online website from now on.

My platform is 64 bit Arch Linux.

johnwhitington commented 4 years ago

The pdf-online error is bogus, in turns out. I'm looking again at the jhove one.

johnwhitington commented 4 years ago

Jhove is being confused by null objects inside an object stream.

It's perfectly valid to have them, but it's not clear why adding the bookmarks is retaining these, and we would like to get rid of them.

Marking for the next release.

gorge:keycount john$ cpdf -list-bookmarks out.pdf >b.txt
gorge:keycount john$ cpdf -add-bookmarks b.txt out.pdf -o out2.pdf
gorge:keycount john$ cpdf -add-bookmarks b.txt mwe.pdf -o out2.pdf
gorge:keycount john$ cpdf -decompress out2.pdf -o decomp.pdf
gorge:keycount john$ mvim decomp.pdf <-- see too many nulls in objstm.