yob / pdf-reader

The PDF::Reader library implements a PDF parser conforming as much as possible to the PDF specification from Adobe.
MIT License
1.81k stars 271 forks source link

Allow for incorrect start index in xref #384

Closed sebbASF closed 2 years ago

sebbASF commented 2 years ago

Found several PDFs (unknown creator) with xref tables that start:

xref
1 <num>
0000000000 65535 f 

This should not happen, as the first entry in the first xref is always supposed to be 0.

These files open OK in the viewers I tried.

This PR adds a work-round to allow the files to be read.

I guess it might be worth logging a warning?

yob commented 2 years ago

Being able to parse PDFs like this makes a lot sense, and I love that there's a sample PDF+spec.

I'm not super keen on the instance_variable_get though, it feels like it'll be hard to maintain in the long term. Do you think we can find an alternative way to get the spec green?

yob commented 2 years ago

lovely!