pypdfium2-team / pypdfium2

Python bindings to PDFium
https://pypdfium2.readthedocs.io/
425 stars 17 forks source link

Support for PIL.PPMImage Plugin #146

Closed rushabh-wadkar closed 2 years ago

rushabh-wadkar commented 2 years ago

pil_image = page.render_topil(scale=200/72)

The above render_topil returns PIL.Image.Image which in turn consumes high memory. Is it possible to render to PIL.PPMImage or custom PIL plugins ? https://pillow.readthedocs.io/en/stable/_modules/PIL/PpmImagePlugin.html

mara004 commented 2 years ago

Well, since version 3, you can plug in any custom converter by subclassing BitmapConvBase and passing it to PdfPage.render_to() or PdfDocument.render_to().

However, I am under the impression that PpmImagePlugin is merely an opener/decoder and not a standalone public class. That said, you could save the image to a buffer (or temporary file) in a compressed format and then re-open it with PIL.

mara004 commented 2 years ago

I take it this is still related to https://github.com/pypdfium2-team/pypdfium2/issues/141?

rushabh-wadkar commented 2 years ago

According to my observation, PIL.Image.Image takes around ~500 MiB for a pdf while the same when converted to PIL.PPMImage it takes around ~300MiB. So just trying to optimise more!

mara004 commented 2 years ago

I think, for what you want to do, you don't even need a custom converter. You could just save the image with PIL and reopen it, as I said:

import io
import PIL.Image

# assuming raw_image is the uncompressed image obtained via `render_topil()`
buf = io.BytesIO()
raw_image.save(buf, format="ppm")
raw_image.close()
buf.seek(0)
compressed_image = PIL.Image.open(buf)
# ...
buf.close()  # once finished with compressed_image

However, I doubt if that provides any advantage at all because the raw image still has to be in memory for some time in any case.