issues
search
jsvine
/
pdfplumber
Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
MIT License
5.97k
stars
618
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
add ignore_char_properties arg in dedupe_chars
#1161
QuentinAndre11
opened
6 days ago
5
Incorrect Annotation Coordinates on Landscape Pages
#1160
ongmk
closed
6 days ago
0
Error While extracting Non-English tables (Arabic) reversed Text
#1159
iiAmeer
opened
1 week ago
3
Failed to extract the table without left and right borders.
#1157
sunjingji
closed
2 weeks ago
1
Multiple letters extracted on PDF table by using extract_text
#1155
ervinwirth
closed
2 weeks ago
5
extract_tables怎么将row中倾斜的文字去除
#1153
zyc1128
closed
2 weeks ago
0
unsupported operand type(s) for *: 'float' and 'PSLiteral'
#1148
sasuke00
closed
1 week ago
3
I got a bug when i parsing a pdf!!!
#1147
idiotTest
closed
3 weeks ago
2
When I set repair=true,there is an error:'utf-8' codec can't decode byte 0xae in position 239: invalid start byte.Because of the original PDF?
#1145
zyc1128
opened
1 month ago
1
page.img 好像没有图片的key,只有一些图片信息?怎么获取PDF中的图片对象
#1144
zhaoyuchen1128
closed
3 weeks ago
0
逐个获取page.chars方法无法得到有些table里的chars,格式一模一样的table有的就不能获取
#1143
zyc1128
closed
3 weeks ago
0
Demonstrations / Examples - links are not available
#1142
koda0601
closed
3 weeks ago
2
page.to_image() PDFium: Data format error
#1140
Hucley
closed
1 month ago
3
Concatenating cropped page objects
#1138
Phylanxy
closed
1 month ago
1
char need an attr linewidth
#1137
xuehuiareafred
opened
1 month ago
4
Text with imaginary lines is being treated as a table
#1136
sachinnethakanipersonal
closed
1 month ago
0
Nothing founded from a pdf. No pages, no chars, nothing.
#1135
AppleRabbitDENG
closed
1 month ago
3
Extracting text from PDFs with encodings- Identity-H, Roman fails, gives a blank response.
#1132
nishantkumar21stjul
closed
1 month ago
1
function "extract_words" extract words that don't exist in a pdf.
#1129
HKAFITGlitter
closed
1 month ago
2
table_settings support draw line per page
#1121
rahxphoon
closed
2 months ago
0
TypeError: argument of type 'PDFObjRef' is not iterable
#1120
ibecav
closed
2 months ago
5
Bump black from 22.3.0 to 24.3.0
#1116
dependabot[bot]
opened
3 months ago
0
Got different result of "page.to_image()" on MacOS and Linux
#1115
zqkou
opened
3 months ago
7
Custom deduppe_chars char properties
#1114
felix-hh
opened
3 months ago
4
About paragraph recognition
#1111
jyyang621
closed
3 months ago
0
Table extraction bug when lines are just barely end-to-end
#1110
jsvine
opened
3 months ago
0
Add `autodetect_direction` option to text-extraction methods
#1109
jsvine
opened
3 months ago
0
Any way to detect formatting?
#1108
enrac5
closed
3 months ago
1
bottom-to-top text in cell rendered in wrong location in extract_text()
#1102
stef
closed
3 months ago
6
The position of numbers and punctuation marks is incorrect
#1098
lbeing
closed
3 months ago
2
page.search("text", regex = True) is magnitudes slower in 0.10.4 compared to 0.10.3.
#1097
mikejokic
closed
4 months ago
5
Extracting table with no vertical lines (only horizontal lines) doesn't work
#1096
jjjhill
closed
4 months ago
0
Some utility methods for logical structure
#1095
dhdaines
closed
4 months ago
5
Handle missing ParentTree
#1094
dhdaines
closed
4 months ago
2
Explicitly close `pypdfium2.PdfDocument` in `get_page_image`
#1090
dhdaines
closed
4 months ago
2
`Page.to_image()` leaks file descriptors
#1089
dhdaines
closed
4 months ago
7
Extracting devnagiri text.
#1083
aumungray
closed
5 months ago
3
Bump jupyterlab from 3.4.2 to 3.6.7
#1082
dependabot[bot]
closed
4 months ago
1
debug_tablefinder is weirdly offset
#1078
px-xp
closed
5 months ago
2
can' resolve pdf encoded in ETenms-B5-H
#1073
JasonYZheng
closed
5 months ago
1
page.to_image() causes error "PermissionError: [WinError 32] The process cannot access the file because it is being used by another process "
#1072
domdrag
closed
4 months ago
5
Lots of whitespaces in between words
#1066
fintech132
closed
6 months ago
0
suggest page.extract_words() word sequence same as page.extract_text()
#1064
rahxphoon
closed
6 months ago
0
Pickle implementation for PDF and Page objects
#1059
rajathsalegame
opened
6 months ago
4
original_path extraction error regarding LTCurve
#1057
KaboChow
opened
6 months ago
2
Why is the order of extracting the contents in the table cells wrong?
#1056
xielulu1994
closed
6 months ago
0
Recognize workflow images created by MS visio as text
#1055
a4073631
closed
6 months ago
0
Page cropbox is not used for bbox if present
#1054
stefanw
closed
5 months ago
3
The analysis results of production environment and development environment are different, whether there is a system difference problem resulting in the same version
#1053
jameslun
closed
7 months ago
1
[Feature] Add `Column` object(s) to `find_table()`
#1050
Pk13055
opened
7 months ago
1
Next