jsvine / pdfplumber

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
MIT License
6.57k stars 659 forks source link

how to know the page bounding box parameters? #275

Closed fdq09eca closed 4 years ago

fdq09eca commented 4 years ago

I always run into this problem when cropping the page and I want to know if there is any way to get the page bounding limit exactly.

ValueError: Bounding box (Decimal('55.256'), Decimal('0'), Decimal('541.199'), Decimal('550.669')) is not fully within parent page bounding box (Decimal('55.256'), Decimal('34.985'), Decimal('541.199'), Decimal('550.669'))

as shown above, the top of the cropping bounding box is 0, this 0 value is replaced from the min value I got from page.chars is less than 0. It obviously makes no sense. Sometimes this replacing strategy works fine, but sometimes it would throw me this error. which the limit of top is not 0.

I tried within_bbox but it persists, is there a better solution? Thank you.

fdq09eca commented 4 years ago

found out that just a simple ternary in list comp will do the job