PDF format - Githubissues

kaitai-io / kaitai_struct_formats

Kaitai Struct: library of binary file formats (.ksy)

http://formats.kaitai.io

712 stars 203 forks source link

PDF format #611

Open heinrich5991 opened 2 years ago

heinrich5991 commented 2 years ago

Specification: https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf Sample: https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf

armijnhemel commented 2 years ago

please note: PDF is typically parsed from the end of the file using an index with offsets, which is difficult with kaitai struct, as you first have to jump all the data, search for the index and then parse the file using the information from the index.

Kreijstal commented 2 years ago

if you want to understand pdf better use qpdf

rillig commented 7 months ago

It would definitely be interesting to see how far Kaitai Struct can model the PDF format, due to these specialties:

Embedded streams that can be decoded into other file formats (TTF, PNG, JPEG)
Multiple references to the same PDF object
Possible gaps in the file that could be garbage-collected or used for steganography
Circular references between PDF objects
Textual PDF commands l, m, Tj