issues
search
jsvine
/
pdfplumber
Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
MIT License
6.57k
stars
659
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Ask for help on processing tables in case where restructuring (i.e. adding schame for) the content of pdf
#605
TCBpenta8
closed
2 years ago
0
extract_text() returns a unicode character \ufb03 LATIN SMALL LIGATURE FFI instead of the letters ffi when it comes across the word Office,
#598
colemanr03
closed
1 year ago
13
ModuleNotFoundError: No module named 'pdfplumber'
#597
sollama
closed
2 years ago
6
addressing issue #578
#581
augeos-grosso
closed
2 years ago
4
Context manager doesn't close file on exceptions
#578
augeos-grosso
closed
2 years ago
1
ModuleNotFoundError Import issue on Mac
#577
Rickyyyyc
closed
2 years ago
2
The colored word is read into two words
#576
Godlikemandyy
closed
2 years ago
1
Refactor
#575
fristhon
closed
2 years ago
3
The text is judged to be tabulated
#569
wenderWang
closed
2 years ago
1
Make final changes for v0.6.0
#568
jsvine
closed
2 years ago
2
Issue to reading data from the pdf
#567
sravanikothakota
closed
2 years ago
1
Cannot locate a two column table without any borders
#566
ayusonkj
closed
2 years ago
2
No tables being detected for a specific PDF.
#564
adicognext
closed
2 years ago
1
Slow read speed
#563
wenderWang
closed
2 years ago
1
Difference in word coordinate information
#560
yavuzKomecoglu
closed
2 years ago
2
Preserving Spaces between paragraph.
#556
noorkaif97
closed
2 years ago
1
Add snap_x/y_tolerance and join_x/y_tolerance table-extraction settings
#553
jsvine
closed
2 years ago
2
Extracting table spanning 2 pages
#550
SamGoodin
closed
2 years ago
2
extracting table from multiple pages at time.
#549
Harshit-tech9
closed
2 years ago
5
Fix bug that crashed table extraction when null value provided for `(text|intersection)_(x|y)_tolerance` keys
#545
samkit-jain
closed
2 years ago
4
Word out of page dimension in extract words
#538
ManuelFay
closed
2 years ago
4
Add experimental .extract_text(layout=True)
#532
jsvine
closed
2 years ago
2
pdfplumber 0.5.28 requires pdfminer.six==20200517, but you have pdfminer-six 20211012 which is incompatible
#531
alexreg
closed
2 years ago
4
Remove decimalizing (but let CLI adjust precision)
#520
jsvine
closed
2 years ago
1
Handle utf-16-encoded annotations (#463)
#519
jsvine
closed
2 years ago
2
Is there any way to include blank lines when extracting texts?
#516
flycattt
closed
3 years ago
5
Upgrade pdfminer.six from 20200517 to 20211012
#515
jsvine
closed
3 years ago
2
page.extract_table() result is None?
#508
yts2020
closed
2 years ago
5
Question on usage
#507
NathanTech7713
closed
3 years ago
1
Extracting hyperlinks raises UnicodeDecodeError
#506
devWhyqueue
closed
3 years ago
0
读取pdf出现:struct.error: unpack requires a bytes object of length 26 错误
#505
xiaoranli1991
closed
2 years ago
2
fix chars deduplication for words with intentionally duplicated chars
#504
konradmalik
closed
3 years ago
2
The position of the words in tables are out of order
#498
changlongpan
closed
3 years ago
1
Fix slowdown in extract_words on long words (#483)
#497
jsvine
closed
3 years ago
3
Difficulty extracting 'cells' from PDF without edges
#493
youpengbo2018
closed
3 years ago
0
How can I extract table without left and right vertical border correctly,and the columns can not change in the extract_table
#492
youpengbo2018
closed
3 years ago
2
When a cell text in a table breaks a line, it will be parsed into two rows
#488
hopepanwei
closed
3 years ago
4
Only template is extracted
#486
Ceros95
closed
3 years ago
4
extract_words() slower when fewer extra_attrs are passed
#484
hadikoub
closed
3 years ago
1
Request: Have `.extract_text()` return an empty string (`''`) instead of `None` in the case of no text found in a PDF
#482
tungph
closed
2 years ago
4
Layout Detection similar to pdfminer.six
#476
jigsawcoder
closed
3 years ago
1
snap_x_tolerance and snap_y_tolerance for extra flexibility in table extraction
#475
ratulbhadury
closed
2 years ago
3
Jumbled text extracted
#468
ParkvilleData
closed
3 years ago
5
Catch last row of each table when horizontal strategy is text
#467
bobluda
closed
3 years ago
1
Missing last row from intermediate tables, when using mixed strategy
#466
bobluda
closed
3 years ago
2
Cannot decode contents of annotations
#463
tungph
closed
2 years ago
5
Explicitly typecast `fontname` and `text` fields to str for char objects
#462
samkit-jain
closed
1 year ago
6
i find a bug when use dedupe_chars()
#461
2212168851
closed
3 years ago
1
Extract SVG Images
#454
emadg
opened
3 years ago
4
extract_words is slow
#453
sreeni5493
closed
3 years ago
2
Previous
Next