jsvine pdfplumber issues

jsvine / pdfplumber

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.

MIT License

6.57k stars 659 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Ask for help on processing tables in case where restructuring (i.e. adding schame for) the content of pdf

#605 TCBpenta8 closed 2 years ago
0
extract_text() returns a unicode character \ufb03 LATIN SMALL LIGATURE FFI instead of the letters ffi when it comes across the word Office,

#598 colemanr03 closed 1 year ago
13
ModuleNotFoundError: No module named 'pdfplumber'

#597 sollama closed 2 years ago
6
addressing issue #578

#581 augeos-grosso closed 2 years ago
4
Context manager doesn't close file on exceptions

#578 augeos-grosso closed 2 years ago
1
ModuleNotFoundError Import issue on Mac

#577 Rickyyyyc closed 2 years ago
2
The colored word is read into two words

#576 Godlikemandyy closed 2 years ago
1
Refactor

#575 fristhon closed 2 years ago
3
The text is judged to be tabulated

#569 wenderWang closed 2 years ago
1
Make final changes for v0.6.0

#568 jsvine closed 2 years ago
2
Issue to reading data from the pdf

#567 sravanikothakota closed 2 years ago
1
Cannot locate a two column table without any borders

#566 ayusonkj closed 2 years ago
2
No tables being detected for a specific PDF.

#564 adicognext closed 2 years ago
1
Slow read speed

#563 wenderWang closed 2 years ago
1
Difference in word coordinate information

#560 yavuzKomecoglu closed 2 years ago
2
Preserving Spaces between paragraph.

#556 noorkaif97 closed 2 years ago
1
Add snap_x/y_tolerance and join_x/y_tolerance table-extraction settings

#553 jsvine closed 2 years ago
2
Extracting table spanning 2 pages

#550 SamGoodin closed 2 years ago
2
extracting table from multiple pages at time.

#549 Harshit-tech9 closed 2 years ago
5
Fix bug that crashed table extraction when null value provided for `(text|intersection)_(x|y)_tolerance` keys

#545 samkit-jain closed 2 years ago
4
Word out of page dimension in extract words

#538 ManuelFay closed 2 years ago
4
Add experimental .extract_text(layout=True)

#532 jsvine closed 2 years ago
2
pdfplumber 0.5.28 requires pdfminer.six==20200517, but you have pdfminer-six 20211012 which is incompatible

#531 alexreg closed 2 years ago
4
Remove decimalizing (but let CLI adjust precision)

#520 jsvine closed 2 years ago
1
Handle utf-16-encoded annotations (#463)

#519 jsvine closed 2 years ago
2
Is there any way to include blank lines when extracting texts?

#516 flycattt closed 3 years ago
5
Upgrade pdfminer.six from 20200517 to 20211012

#515 jsvine closed 3 years ago
2
page.extract_table() result is None?

#508 yts2020 closed 2 years ago
5
Question on usage

#507 NathanTech7713 closed 3 years ago
1
Extracting hyperlinks raises UnicodeDecodeError

#506 devWhyqueue closed 3 years ago
0
读取pdf出现：struct.error: unpack requires a bytes object of length 26 错误

#505 xiaoranli1991 closed 2 years ago
2
fix chars deduplication for words with intentionally duplicated chars

#504 konradmalik closed 3 years ago
2
The position of the words in tables are out of order

#498 changlongpan closed 3 years ago
1
Fix slowdown in extract_words on long words (#483)

#497 jsvine closed 3 years ago
3
Difficulty extracting 'cells' from PDF without edges

#493 youpengbo2018 closed 3 years ago
0
How can I extract table without left and right vertical border correctly,and the columns can not change in the extract_table

#492 youpengbo2018 closed 3 years ago
2
When a cell text in a table breaks a line, it will be parsed into two rows

#488 hopepanwei closed 3 years ago
4
Only template is extracted

#486 Ceros95 closed 3 years ago
4
extract_words() slower when fewer extra_attrs are passed

#484 hadikoub closed 3 years ago
1
Request: Have `.extract_text()` return an empty string (`''`) instead of `None` in the case of no text found in a PDF

#482 tungph closed 2 years ago
4
Layout Detection similar to pdfminer.six

#476 jigsawcoder closed 3 years ago
1
snap_x_tolerance and snap_y_tolerance for extra flexibility in table extraction

#475 ratulbhadury closed 2 years ago
3
Jumbled text extracted

#468 ParkvilleData closed 3 years ago
5
Catch last row of each table when horizontal strategy is text

#467 bobluda closed 3 years ago
1
Missing last row from intermediate tables, when using mixed strategy

#466 bobluda closed 3 years ago
2
Cannot decode contents of annotations

#463 tungph closed 2 years ago
5
Explicitly typecast `fontname` and `text` fields to str for char objects

#462 samkit-jain closed 1 year ago
6
i find a bug when use dedupe_chars()

#461 2212168851 closed 3 years ago
1
Extract SVG Images

#454 emadg opened 3 years ago
4
extract_words is slow

#453 sreeni5493 closed 3 years ago
2

Previous Next