rschroll / rmrl

Render reMarkable documents to PDF
GNU General Public License v3.0
119 stars 21 forks source link

merge_pages() recursively searches for page size (fixes #11) #13

Open Gigahawk opened 2 years ago

Gigahawk commented 2 years ago

Spent a bit looking into this on my own before realizing someone had already figured it out in #11.

Using the following recursive dict search mostly from here on a PDF of a random textbook I have:

def _finditem(obj, key, path=None):
    if path is None:
        path = []
    if key in obj:
        print(f"key {key} found at path {path}")
        return obj[key]
    for k, v in obj.items():
        if isinstance(v,dict):
            item = _finditem(v, key, path + [k])
            if item is not None:
                return item
print(_finditem(basepage, '/MediaBox')

The output looks like:

key /MediaBox found at path ['/Parent', '/Parent', '/Parent', '/Parent', '/Parent']
['0', '0', '612', '792']

The output matches what pdfinfo finds:

(rmrl-WvqN329U-py3.8) rmrl-WvqN329U-py3 λ › pdfinfo -box /mnt/d/remarkable_sync/5cf892dc-6471-430c-9c75-7e83867f5eab.pdf                                                             git_WSL/rmrl fix_pagesize
Title:          Mastering STM32
Author:         Carmine Noviello
Creator:        LaTeX with hyperref package
Producer:       XeTeX 0.99999
CreationDate:   Fri Aug 17 06:35:42 2018 PDT
ModDate:        Tue Jun 11 03:03:22 2019 PDT
Tagged:         no
UserProperties: no
Suspects:       no
Form:           AcroForm
JavaScript:     no
Pages:          852
Encrypted:      no
Page size:      612 x 792 pts (letter)
Page rot:       0
MediaBox:           0.00     0.00   612.00   792.00
CropBox:            0.00     0.00   612.00   792.00
BleedBox:           0.00     0.00   612.00   792.00
TrimBox:            0.00     0.00   612.00   792.00
ArtBox:             0.00     0.00   612.00   792.00
File size:      38994922 bytes
Optimized:      no
PDF version:    1.5