jsvine / pdfplumber

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
MIT License
6.02k stars 619 forks source link

Add --structure-text flag to CLI (like `pdfinfo -struct-text` but better) #967

Closed dhdaines closed 10 months ago

dhdaines commented 11 months ago

This depends on my two other PRs (#961 and #963) so don't merge it before them!

Produces helpful JSON output, much like pdfinfo -struct-text, but better, since it's actual JSON and not arbitrary text (or arbitrary XML, but we could do that too if you think it's useful)

codecov[bot] commented 11 months ago

Codecov Report

Merging #967 (57eebcb) into develop (336f83f) will decrease coverage by 0.17%. The diff coverage is 100.00%.

:exclamation: Current head 57eebcb differs from pull request most recent head 615a1d2. Consider uploading reports for the commit 615a1d2 to get more accurate results

@@             Coverage Diff             @@
##           develop     #967      +/-   ##
===========================================
- Coverage   100.00%   99.83%   -0.17%     
===========================================
  Files           18       19       +1     
  Lines         1613     1840     +227     
===========================================
+ Hits          1613     1837     +224     
- Misses           0        3       +3     
Files Changed Coverage Δ
pdfplumber/cli.py 100.00% <100.00%> (ø)
pdfplumber/page.py 99.36% <100.00%> (-0.64%) :arrow_down:
pdfplumber/pdf.py 100.00% <100.00%> (ø)
pdfplumber/structure.py 100.00% <100.00%> (ø)

... and 1 file with indirect coverage changes

dhdaines commented 10 months ago

I will rebase this (and force-push) to make it cleaner to merge and fix the conflicts.

dhdaines commented 10 months ago

Actually, no, I will just incorporate it into #963 after rebasing that one onto the current develop branch!

dhdaines commented 10 months ago

Merged into #963