py-pdf / pypdf

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
https://pypdf.readthedocs.io/en/latest/
Other
8.41k stars 1.42k forks source link

Add an argument ``layout_mode_height_weight`` to control inference of vertical space when extracting text in layout mode #2915

Closed hpierre001 closed 4 weeks ago

hpierre001 commented 1 month ago

Sometimes, two lines can be separated by an empty line which is height is smaller than than font_height. The goal would be to control wether it should be kept as an empty line or not.

stefan6419846 commented 1 month ago

Feel free to submit a corresponding PR which adds the new kwarg and appropriate tests.