jsvine / pdfplumber

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
MIT License
6.02k stars 619 forks source link

Add `x_tolerance_ratio` param to `extract_text` and similar functions (now properly linted!) #1041

Closed afriedman412 closed 8 months ago

afriedman412 commented 8 months ago

Fix https://github.com/jsvine/pdfplumber/issues/987 (partially)

Passing x_tolerance_ratio to extract_text() and any other function that relies on WordExtractor will use the ratio * text size to determine where words begin and end. Overrides the x_tolerance param.

There is room to build out y_tolerance_ratio too, if need be in the future!

codecov[bot] commented 8 months ago

Codecov Report

Merging #1041 (33ac833) into develop (d9561d1) will not change coverage. Report is 14 commits behind head on develop. The diff coverage is 100.00%.

@@            Coverage Diff            @@
##           develop     #1041   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           18        18           
  Lines         1615      1620    +5     
=========================================
+ Hits          1615      1620    +5     
Files Coverage Δ
pdfplumber/utils/text.py 100.00% <100.00%> (ø)
jsvine commented 8 months ago

Thanks, and now merging!