jsvine pdfplumber issues

jsvine / pdfplumber

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.

MIT License

5.99k stars 618 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Offset in bounding boxes for every pdf passed

#1049 yashsandansing closed 6 months ago
2
[Documentation] `annot` properties

#1044 Pk13055 closed 7 months ago
5
alway convert patterns 'ti' to number 5 when extecting text, like question to ques5on, solution to solu5on.

#1043 rucwangw closed 7 months ago
2
mixed documentation of PDF.close() and Page.flush_cache() in README

#1042 luketudge closed 7 months ago
3
Add `x_tolerance_ratio` param to `extract_text` and similar functions (now properly linted!)

#1041 afriedman412 closed 7 months ago
2
adding extract_text_dir_sensitive

#1040 afriedman412 closed 4 months ago
9
Add `x_tolerance_ratio` param to `extract_text` and similar functions

#1037 afriedman412 closed 8 months ago
0
Add `Page.table_of_contents`

#1034 jsvine opened 8 months ago
0
Issue with importing pdfplumber library

#1033 FatemaD1577 closed 8 months ago
6
Add `gswin64c` as another possible GS executable

#1032 echedey-ls closed 8 months ago
2
ghostscript x64 not found on Windows, even though it is on %PATH%

#1031 echedey-ls closed 8 months ago
0
`extract_text(extra_attrs=["size"])` raises a parsing error

#1030 RitaMarques closed 8 months ago
2
Merge v0.10.3 into stable

#1029 jsvine closed 8 months ago
1
extract_table omits the last row that is incomplete

#1024 chengtie closed 8 months ago
1
No corresponding JavaScript version available？

#1023 jameslun closed 8 months ago
1
pdfplumber characters missing ( for Chinese character )

#1022 mosescha closed 8 months ago
9
Possible to output to new pdf

#1021 yezhengli-Mr9 closed 8 months ago
1
Update link causing 404

#1020 hussainshaikh12 closed 8 months ago
2
Possibility to extract table of contents

#1018 Alssndr0 closed 8 months ago
2
Add `.extract_table(...)` logic to avoid assigning characters to multiple cells

#1013 jsvine closed 8 months ago
1
extract table headings along with table contents

#1008 poojitharamachandra closed 8 months ago
0
when I use the extract_text funtion, the x_tolerance argument doesn't work for me.

#1004 papandadj closed 8 months ago
0
Update README.md

#1003 jakobdo closed 7 months ago
4
Polygons other than rects for crop (etc)

#1001 pseudomonas opened 9 months ago
3
Functions that can be multi-threaded - Enhancement to documentation

#995 sandzone opened 9 months ago
5
Extract tables not extracting particular format of tables

#993 John-Peter-R closed 9 months ago
1
help！解析文件会出现重复的文本

#992 PeifengRen closed 6 months ago
5
For text extraction, add fractional versions of `x/y_tolerance` arguments

#987 jsvine opened 9 months ago
14
"mproving PDF-to-Text Conversion: Integrating Tables as Markup Text on a Page-by-Page Basis

#984 Isha09Garg opened 9 months ago
1
Respect `use_text_flow` in `extract_text`

#983 dhdaines closed 9 months ago
3
`extract_text(use_text_flow=True)` apparently does nothing

#982 dhdaines closed 9 months ago
2
Extract table merged cells

#979 John-Peter-R opened 10 months ago
4
Header missing

#977 blockchainDevCDST closed 10 months ago
0
Add --structure-text flag to CLI (like `pdfinfo -struct-text` but better)

#967 dhdaines closed 10 months ago
4
fix issue 964

#965 jnhyperion opened 11 months ago
2
extracted word is broken

#964 jnhyperion closed 10 months ago
7
Support for PDF 1.3 logical structure

#963 dhdaines closed 7 months ago
7
Support for marked content section IDs

#961 dhdaines closed 10 months ago
4
Cryptography cant find file.

#960 mdevore300 opened 11 months ago
5
update form-parsing example in README

#958 jeremybmerrill closed 10 months ago
11
when I try to use extract_words() ,can't get some text

#956 fangjiyuan opened 11 months ago
6
UnicodeEncodeError: 'charmap' codec can't encode character '\uf0b7' in position 908: character maps to <undefined>

#953 jchristn closed 10 months ago
20
Merge v0.10.2

#951 jsvine closed 11 months ago
1
.to_image() treats a stream as a regular file

#948 Urbener closed 11 months ago
3
Accept Iterable for geometry utils (fixes #945)

#946 dhdaines closed 11 months ago
3
`pdfplumber.utils` functions should take `Iterable` and not `List` arguments

#945 dhdaines closed 11 months ago
1
Extracting table with vertical texts give unreadable result

#942 Dragon2fly opened 11 months ago
9
Get the Text associated with the hyperlinks - PdfPlumber

#940 mukundhareddy1996 closed 11 months ago
2
Add support for structure tree and marked content sections

#937 dhdaines closed 11 months ago
9
v0.10.0

#936 jsvine closed 11 months ago
1

Previous Next