pmaupin / pdfrw

pdfrw is a pure Python library that reads and writes PDFs
Other
1.86k stars 271 forks source link

How to get the minimal bounding box for a page? (crop the margins as much as possible) #201

Open josephernest opened 4 years ago

josephernest commented 4 years ago

How to find the minimal box that contains all content of a page with pdfrw?

image

I looked at page.MediaBox and page.CropBox but it does not help.


One solution would be to temporarily render the page as a bitmap/image A, and then I could compute the maximal x_0 value such that everything on the left of x_0 is white background.

Is it possible to do this with pdfrw? If not, which library should be used for such things?