issues
search
jsvine
/
pdfplumber
Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
MIT License
5.99k
stars
618
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Offset in bounding boxes for every pdf passed
#1049
yashsandansing
closed
6 months ago
2
[Documentation] `annot` properties
#1044
Pk13055
closed
7 months ago
5
alway convert patterns 'ti' to number 5 when extecting text, like question to ques5on, solution to solu5on.
#1043
rucwangw
closed
7 months ago
2
mixed documentation of PDF.close() and Page.flush_cache() in README
#1042
luketudge
closed
7 months ago
3
Add `x_tolerance_ratio` param to `extract_text` and similar functions (now properly linted!)
#1041
afriedman412
closed
7 months ago
2
adding extract_text_dir_sensitive
#1040
afriedman412
closed
4 months ago
9
Add `x_tolerance_ratio` param to `extract_text` and similar functions
#1037
afriedman412
closed
8 months ago
0
Add `Page.table_of_contents`
#1034
jsvine
opened
8 months ago
0
Issue with importing pdfplumber library
#1033
FatemaD1577
closed
8 months ago
6
Add `gswin64c` as another possible GS executable
#1032
echedey-ls
closed
8 months ago
2
ghostscript x64 not found on Windows, even though it is on %PATH%
#1031
echedey-ls
closed
8 months ago
0
`extract_text(extra_attrs=["size"])` raises a parsing error
#1030
RitaMarques
closed
8 months ago
2
Merge v0.10.3 into stable
#1029
jsvine
closed
8 months ago
1
extract_table omits the last row that is incomplete
#1024
chengtie
closed
8 months ago
1
No corresponding JavaScript version available?
#1023
jameslun
closed
8 months ago
1
pdfplumber characters missing ( for Chinese character )
#1022
mosescha
closed
8 months ago
9
Possible to output to new pdf
#1021
yezhengli-Mr9
closed
8 months ago
1
Update link causing 404
#1020
hussainshaikh12
closed
8 months ago
2
Possibility to extract table of contents
#1018
Alssndr0
closed
8 months ago
2
Add `.extract_table(...)` logic to avoid assigning characters to multiple cells
#1013
jsvine
closed
8 months ago
1
extract table headings along with table contents
#1008
poojitharamachandra
closed
8 months ago
0
when I use the extract_text funtion, the x_tolerance argument doesn't work for me.
#1004
papandadj
closed
8 months ago
0
Update README.md
#1003
jakobdo
closed
7 months ago
4
Polygons other than rects for crop (etc)
#1001
pseudomonas
opened
9 months ago
3
Functions that can be multi-threaded - Enhancement to documentation
#995
sandzone
opened
9 months ago
5
Extract tables not extracting particular format of tables
#993
John-Peter-R
closed
9 months ago
1
help!解析文件会出现重复的文本
#992
PeifengRen
closed
6 months ago
5
For text extraction, add fractional versions of `x/y_tolerance` arguments
#987
jsvine
opened
9 months ago
14
"mproving PDF-to-Text Conversion: Integrating Tables as Markup Text on a Page-by-Page Basis
#984
Isha09Garg
opened
9 months ago
1
Respect `use_text_flow` in `extract_text`
#983
dhdaines
closed
9 months ago
3
`extract_text(use_text_flow=True)` apparently does nothing
#982
dhdaines
closed
9 months ago
2
Extract table merged cells
#979
John-Peter-R
opened
10 months ago
4
Header missing
#977
blockchainDevCDST
closed
10 months ago
0
Add --structure-text flag to CLI (like `pdfinfo -struct-text` but better)
#967
dhdaines
closed
10 months ago
4
fix issue 964
#965
jnhyperion
opened
11 months ago
2
extracted word is broken
#964
jnhyperion
closed
10 months ago
7
Support for PDF 1.3 logical structure
#963
dhdaines
closed
7 months ago
7
Support for marked content section IDs
#961
dhdaines
closed
10 months ago
4
Cryptography cant find file.
#960
mdevore300
opened
11 months ago
5
update form-parsing example in README
#958
jeremybmerrill
closed
10 months ago
11
when I try to use extract_words() ,can't get some text
#956
fangjiyuan
opened
11 months ago
6
UnicodeEncodeError: 'charmap' codec can't encode character '\uf0b7' in position 908: character maps to <undefined>
#953
jchristn
closed
10 months ago
20
Merge v0.10.2
#951
jsvine
closed
11 months ago
1
.to_image() treats a stream as a regular file
#948
Urbener
closed
11 months ago
3
Accept Iterable for geometry utils (fixes #945)
#946
dhdaines
closed
11 months ago
3
`pdfplumber.utils` functions should take `Iterable` and not `List` arguments
#945
dhdaines
closed
11 months ago
1
Extracting table with vertical texts give unreadable result
#942
Dragon2fly
opened
11 months ago
9
Get the Text associated with the hyperlinks - PdfPlumber
#940
mukundhareddy1996
closed
11 months ago
2
Add support for structure tree and marked content sections
#937
dhdaines
closed
11 months ago
9
v0.10.0
#936
jsvine
closed
11 months ago
1
Previous
Next