jsvine / pdfplumber

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
MIT License
6.1k stars 625 forks source link

Simply running example, but engage TypeError #879

Closed coolinstar closed 1 year ago

coolinstar commented 1 year ago

Describe the bug

Try to learn how to extract tables to dataframe. Running sample code and get the following error.

Code to reproduce the problem

` import pdfplumber from wand.display import display

pdf = pdfplumber.open("pdfprocess/sample.pdf")

p0 = pdf.pages[0] im = p0.to_image() display(im)

`

PDF file

https://www.customs.gov.sg/files/businesses/GuidetoNACWCLicencewithSchChemList.pdf

Please attach any PDFs necessary to reproduce the problem.

Expected behavior

As the demo shows, I aspect a picture pops up.

Actual behavior

Error

Screenshots

error msg :
Exception has occurred: TypeError image must be a wand.image.Image instance, not <pdfplumber.display.PageImage object at 0x000002CD9527D7C0> File "C:\somepath\testpdf2.py", line 8, in display(im) TypeError: image must be a wand.image.Image instance, not <pdfplumber.display.PageImage object at 0x000002CD9527D7C0>

Environment

jsvine commented 1 year ago

In what context are you running the code? If in a Jupyter-style notebook, display should be unnecessary; the im object should just render on its own. If in a script, you'll want to run im.save("path/to/file.png) instead.