jsvine / pdfplumber

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
MIT License
6.31k stars 647 forks source link

pdf plumber to_image( ) OSError: exception: access violation writing 0x0000000000000008 #713

Closed jjjkuba closed 2 years ago

jjjkuba commented 2 years ago

Describe the bug

When i try to convert some pages from opened pdf to images using to_image() method it throws the follwoing error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\jjjku\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\pdfplumber\page.py", line 381, in to_image
    return PageImage(self, **kwargs)
  File "C:\Users\jjjku\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\pdfplumber\display.py", line 93, in __init__
    self.original = get_page_image(
  File "C:\Users\jjjku\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\pdfplumber\display.py", line 54, in get_page_image
    with WandImage(
  File "C:\Users\jjjku\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\wand\image.py", line 9306, in __init__
    wand = library.NewMagickWand()
OSError: exception: access violation writing 0x0000000000000008
>>>

I tried to invoke this method from PyCharm and from windows cmd ant the result is the same.

Code to reproduce the problem (code I used in cmd)

>>> import pdfplumber
>>> myPDF = pdfplumber.open("scansmpl.pdf")
>>> myPDF.pages[0].to_image()

PDF file

Basically it occured for any pdf file I tried.

scansmpl.pdf

Expected behavior

Rather not an error but image object

Actual behavior

Error was thrown

Environment

Additional context

If it was not a bug but something wrong on my side then sorry. I tried to find the solution to the problem on my side but without results. I have Magic Wand and Ghostscript installed and environment variable for Magic Wand is set, so I don't know what more can be the cause of this issue.

jsvine commented 2 years ago

Hi @jjjkuba, and thanks for the well-described report. The error appears to stem from one of this library's dependencies, Wand, and appears to have been fixed recently: https://github.com/emcconville/wand/issues/586 & https://github.com/emcconville/wand/issues/587

If you pip install -U wand, does that resolve the error?

jjjkuba commented 2 years ago

Yes, it solved the issue. Thanks

maayansharon10 commented 1 year ago

Hello, for me it still doesn't work. working with google colab, installed with pip - pdfplumber, imagemagick. had the same error as here, as suggested above ran- !pip install -U wand

my code looks like this-

import pdfplumber

pdf = pdfplumber.open("file.pdf") # Import the PDF.
page = pdf.pages[0] 
im = page.to_image()
im

and now I get this error -

---------------------------------------------------------------------------
PolicyError                               Traceback (most recent call last)
<ipython-input-56-61f21a06c7de> in <module>
---> 14 im = page.to_image()

5 frames
/usr/local/lib/python3.7/dist-packages/wand/resource.py in raise_exception(self, stacklevel)
    223             warnings.warn(e, stacklevel=stacklevel + 1)
    224         elif isinstance(e, Exception):
--> 225             raise e
    226 
    227     def make_blob(self, format=None):

PolicyError: not authorized `file.pdf' @ error/constitute.c/ReadImage/412

would love some help

jsvine commented 1 year ago

@maayansharon10 Your issue appears to be different than the one discussed here. Instead, please see this thread: https://github.com/jsvine/pdfplumber/issues/81#issuecomment-1312277946

Does that help?

maayansharon10 commented 1 year ago

thank you @jsvine ! Sorry, I wasn't aware it's not the same issue/ it was already discussed, but it solved it! thanks!