jsvine pdfplumber issues

jsvine / pdfplumber

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.

MIT License

5.97k stars 618 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

add ignore_char_properties arg in dedupe_chars

#1161 QuentinAndre11 opened 6 days ago
5
Incorrect Annotation Coordinates on Landscape Pages

#1160 ongmk closed 6 days ago
0
Error While extracting Non-English tables (Arabic) reversed Text

#1159 iiAmeer opened 1 week ago
3
Failed to extract the table without left and right borders.

#1157 sunjingji closed 2 weeks ago
1
Multiple letters extracted on PDF table by using extract_text

#1155 ervinwirth closed 2 weeks ago
5
extract_tables怎么将row中倾斜的文字去除

#1153 zyc1128 closed 2 weeks ago
0
unsupported operand type(s) for *: 'float' and 'PSLiteral'

#1148 sasuke00 closed 1 week ago
3
I got a bug when i parsing a pdf!!!

#1147 idiotTest closed 3 weeks ago
2
When I set repair=true,there is an error:'utf-8' codec can't decode byte 0xae in position 239: invalid start byte.Because of the original PDF?

#1145 zyc1128 opened 1 month ago
1
page.img 好像没有图片的key，只有一些图片信息?怎么获取PDF中的图片对象

#1144 zhaoyuchen1128 closed 3 weeks ago
0
逐个获取page.chars方法无法得到有些table里的chars,格式一模一样的table有的就不能获取

#1143 zyc1128 closed 3 weeks ago
0
Demonstrations / Examples - links are not available

#1142 koda0601 closed 3 weeks ago
2
page.to_image() PDFium: Data format error

#1140 Hucley closed 1 month ago
3
Concatenating cropped page objects

#1138 Phylanxy closed 1 month ago
1
char need an attr linewidth

#1137 xuehuiareafred opened 1 month ago
4
Text with imaginary lines is being treated as a table

#1136 sachinnethakanipersonal closed 1 month ago
0
Nothing founded from a pdf. No pages, no chars, nothing.

#1135 AppleRabbitDENG closed 1 month ago
3
Extracting text from PDFs with encodings- Identity-H, Roman fails, gives a blank response.

#1132 nishantkumar21stjul closed 1 month ago
1
function "extract_words" extract words that don't exist in a pdf.

#1129 HKAFITGlitter closed 1 month ago
2
table_settings support draw line per page

#1121 rahxphoon closed 2 months ago
0
TypeError: argument of type 'PDFObjRef' is not iterable

#1120 ibecav closed 2 months ago
5
Bump black from 22.3.0 to 24.3.0

#1116 dependabot[bot] opened 3 months ago
0
Got different result of "page.to_image()" on MacOS and Linux

#1115 zqkou opened 3 months ago
7
Custom deduppe_chars char properties

#1114 felix-hh opened 3 months ago
4
About paragraph recognition

#1111 jyyang621 closed 3 months ago
0
Table extraction bug when lines are just barely end-to-end

#1110 jsvine opened 3 months ago
0
Add `autodetect_direction` option to text-extraction methods

#1109 jsvine opened 3 months ago
0
Any way to detect formatting?

#1108 enrac5 closed 3 months ago
1
bottom-to-top text in cell rendered in wrong location in extract_text()

#1102 stef closed 3 months ago
6
The position of numbers and punctuation marks is incorrect

#1098 lbeing closed 3 months ago
2
page.search("text", regex = True) is magnitudes slower in 0.10.4 compared to 0.10.3.

#1097 mikejokic closed 4 months ago
5
Extracting table with no vertical lines (only horizontal lines) doesn't work

#1096 jjjhill closed 4 months ago
0
Some utility methods for logical structure

#1095 dhdaines closed 4 months ago
5
Handle missing ParentTree

#1094 dhdaines closed 4 months ago
2
Explicitly close `pypdfium2.PdfDocument` in `get_page_image`

#1090 dhdaines closed 4 months ago
2
`Page.to_image()` leaks file descriptors

#1089 dhdaines closed 4 months ago
7
Extracting devnagiri text.

#1083 aumungray closed 5 months ago
3
Bump jupyterlab from 3.4.2 to 3.6.7

#1082 dependabot[bot] closed 4 months ago
1
debug_tablefinder is weirdly offset

#1078 px-xp closed 5 months ago
2
can' resolve pdf encoded in ETenms-B5-H

#1073 JasonYZheng closed 5 months ago
1
page.to_image() causes error "PermissionError: [WinError 32] The process cannot access the file because it is being used by another process "

#1072 domdrag closed 4 months ago
5
Lots of whitespaces in between words

#1066 fintech132 closed 6 months ago
0
suggest page.extract_words() word sequence same as page.extract_text()

#1064 rahxphoon closed 6 months ago
0
Pickle implementation for PDF and Page objects

#1059 rajathsalegame opened 6 months ago
4
original_path extraction error regarding LTCurve

#1057 KaboChow opened 6 months ago
2
Why is the order of extracting the contents in the table cells wrong?

#1056 xielulu1994 closed 6 months ago
0
Recognize workflow images created by MS visio as text

#1055 a4073631 closed 6 months ago
0
Page cropbox is not used for bbox if present

#1054 stefanw closed 5 months ago
3
The analysis results of production environment and development environment are different, whether there is a system difference problem resulting in the same version

#1053 jameslun closed 7 months ago
1
[Feature] Add `Column` object(s) to `find_table()`

#1050 Pk13055 opened 7 months ago
1