pmaupin / pdfrw

pdfrw is a pure Python library that reads and writes PDFs
Other
1.86k stars 271 forks source link

Adding a .Pages attribute in PdfWriter to allow setting its MediaBox or Resources fields #205

Open Lucas-C opened 4 years ago

Lucas-C commented 4 years ago

This allows for the following usage:

out = PdfWriter()
out.Pages = IndirectPdfDict(
    MediaBox=...,
    Resources=...,
)
out.addpage(page)
out.write(...)
Lucas-C commented 4 years ago

The pipeline fails due to 2 differing hash issues:

self = <tests.test_examples.TestOnePdf testMethod=test_rl1_platypus>

    def test_rl1_platypus(self):
        if sys.version_info < (2, 7):
            return
        self.do_test('rl1/platypus_pdf_template b1c400de699af29ea3f1983bb26870ab',
>                    scrub=True)

../../../test_examples.py:189: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../../../test_examples.py:112: in do_test
    self.assertEqual(hash, expects)
E   AssertionError: 'bb2449c75d96ff7913d59af89f0fd8b7' != '88bd087c4dc039ced05faea3920cbec5'
E   - bb2449c75d96ff7913d59af89f0fd8b7
E   + 88bd087c4dc039ced05faea3920cbec5

Corresponding line in expected.txt: examples/rl1/platypus_pdf_template_b1c400de699af29ea3f1983bb26870ab 88bd087c4dc039ced05faea3920cbec5

self = <tests.test_roundtrip.TestOnePdf testMethod=test_repaginate_7037a992b80b60f0294016037baa9292.pdf>

    def test(self):
>       self.roundtrip(*args, **kw)

../../test_roundtrip.py:110: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../../test_roundtrip.py:94: in roundtrip
    self.assertEqual(hash, expects)
E   AssertionError: '4df027cb2f2d2b13efa04071fa5def07' != 'dd41b0104f185206b51e7ffe5b07d261'
E   - 4df027cb2f2d2b13efa04071fa5def07
E   + dd41b0104f185206b51e7ffe5b07d261

Corresponding line in expected.txt: repaginate/7037a992b80b60f0294016037baa9292.pdf dd41b0104f185206b51e7ffe5b07d261

Lucas-C commented 4 years ago

I added a commit to fix the RuntimeError: generator raised StopIteration errors with Python 3.7 (due to a missing call to py23_diffs.iteritems in pdfwriter.FormatObjects.format_obj), but it generated a lot more differing-hash errors... 😢

Any help would be appreciated !

Lucas-C commented 3 years ago

PR copied to @sarnold fork: https://github.com/sarnold/pdfrw/pull/5