Closed linuxsoftware closed 2 years ago
Hi @linuxsoftware, and thanks for flagging this! Since the default seems to work well for most PDFs, I'd lean toward an approach that allows the user to specify the conversion mode via an argument passed to get_page_image(...)
and Page.to_image(...)
. I'll put this on my todo list, though you're also welcome to submit a PR.
I was thinking about this and realized it is already possible to pass a user-created original image in to to_image
so perhaps the code does not need to change at all.
e.g.
def my_page_image(page):
stream = page.pdf.stream
page_no = page.page_number - 1
with wand.image.Image(resolution=150,
filename=f"{stream.name}[{page_no}]") as img:
with img.convert("png8") as png:
im = PIL.Image.open(BytesIO(png.make_blob()))
return im.convert("RGB")
pi=page.to_image(original=my_page_image(page))
The main thing is for the user to realize the 8 bit limitation of Pillow when converting images. Perhaps it is enough that this conversation will now show up in searches, or perhaps it's worth a note in the Visual Debugging documentation?
I believe that the latest version(s) of pdfplumber
, which make some more generalized improvements/changes, now convert your PDF to an acceptable image:
Thank you for this extremely useful library.
I had a problem with visual debugging of a PDF that was mostly grey. All the text turned white so it could not be seen.
Here is an example PDF.
The problem is ImageMagick creates the image of the page as a 16bit greyscale PNG, but Pillow has a documented issue with converting that to RGB. (See https://stackoverflow.com/questions/19892919/pil-converting-an-image-with-mode-i-to-rgb-results-in-a-fully-white-image and https://github.com/python-pillow/Pillow/issues/3011)
My hack has been to change display.py so that ImageMagick creates the image as an 8bit PNG using
convert("png8")
, which Pillow can then cope with. This "works for me".Environment