pdf-rs / pdf

Rust library to read, manipulate and write PDF files.
MIT License
1.24k stars 119 forks source link

Issues reading cross-reference stream with incorrect /Size #160

Open divergentdave opened 1 year ago

divergentdave commented 1 year ago

I used origami to create a password-encrypted PDF to attempt to reproduce #159, and I found that pdf-rs is intolerant of files where the /Size of a cross-reference stream is too small for its contents.

I produced this file with the following script, from the existing pdf-sample.pdf.

#!/usr/bin/env ruby
require 'origami'

[
  ['RC4', 40, false, 'pdf-sample_rc4_rev2.pdf'],
  ['RC4', 64, false, 'pdf-sample_rc4_rev3.pdf'],
  ['AES', 128, false, 'pdf-sample_aes_128.pdf'],
  ['AES', 256, false, 'pdf-sample_aes_256.pdf'],
  ['AES', 256, true, 'pdf-sample_aes_256_hardened.pdf'],
].each do |(cipher, key_size, hardened, file_name)|
  pdf = Origami::PDF.read('../pdf-sample.pdf')
  pdf.encrypt(
    user_passwd: 'userpassword',
    owner_passwd: 'ownerpassword',
    cipher: cipher,
    key_size: key_size,
    hardened: hardened,
  )
  pdf.save(file_name, noindent: true)
end

By inspection, it's clear that the modifications origami made to the trailer's dictionary are inconsistent. The /Index ends with "28 2", yet the /Size is only 27. When this file is loaded, the last subsection of the cross reference stream is successfully parsed, but then XRefTable::add_entries_from() discards both entries in the subsection (with IDs 28 and 29) because they don't fit in the vector. Thus, future indirect references to the objects fail to resolve. (Soon after, opening the file fails with "Entry 28 in xref table unspecified")

Should Backend::read_xref_table_and_trailer() scan the /Index array and update the table's size if necessary?

s3bk commented 1 year ago

I think there needs to be a fallback that reads the entire file and rebuilds the xref table. What you proposed can be implemented as well.