yob / pdf-reader

The PDF::Reader library implements a PDF parser conforming as much as possible to the PDF specification from Adobe.
MIT License
1.82k stars 271 forks source link

xref table not found at offset #507

Open ndvo opened 1 year ago

ndvo commented 1 year ago

I am getting an error parsing this file. The xref table does exists. Maybe there are less strict rules to reaching it.

The error is: PDF::Reader::MalformedPDFError Exception: xref table not found at offset ...

fail_to_parse.pdf

paulsizer commented 3 months ago

@ndvo Do you get anywhere with this? I have just got this error with a PDF I am trying to parse

ndvo commented 3 months ago

Well my goal was rather limited. I was only parsing the PDF to check if it was protected or not. I ended up adding a rescue for this exception and parsing the file with CombinePDF. Something like this:

rescue ::PDF::Reader::MalformedPDFError => error
  if error.message.match(/xref table not found at offset/).present?
    pdf = CombinePDF::PDFParser.new(value.read, allow_optional_content: true)
    pdf.parse

    if pdf.root_object[:Encrypt].present?
      record.errors.add(...)