jsvine / pdfplumber

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
MIT License
6.57k stars 659 forks source link

Refactor several complex methods and add `extra_attrs` to `.extract_words(...)` #260

Closed jsvine closed 4 years ago

jsvine commented 4 years ago

This commit largely focuses on refactoring a few previously-complex methods. In the process, however, it also adds support for the long-requested ability to group word-characters on (and convey information about) extra attributes (such as fontname and size) (#28).

See CHANGELOG.md for more detailed list of changes.

codecov[bot] commented 4 years ago

Codecov Report

Merging #260 into develop will increase coverage by 0.02%. The diff coverage is 100.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #260      +/-   ##
===========================================
+ Coverage    97.41%   97.44%   +0.02%     
===========================================
  Files           10       10              
  Lines         1160     1173      +13     
===========================================
+ Hits          1130     1143      +13     
  Misses          30       30              
Impacted Files Coverage Δ
pdfplumber/container.py 100.00% <ø> (ø)
pdfplumber/convert.py 100.00% <100.00%> (ø)
pdfplumber/page.py 100.00% <100.00%> (ø)
pdfplumber/utils.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 59a7dd2...cb92434. Read the comment docs.