jsvine / pdfplumber

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
MIT License
6.57k stars 659 forks source link

Some additional documentation in `pdfplumber.Page.to_image(**conversion_kwargs)`? #760

Closed jwestwsj closed 1 year ago

jwestwsj commented 1 year ago

I ran into a tiny issue, that might be obvious to others (but wasn't to me), and I think a line in the docs would go a long way toward helping others.

Right now, this—

im = page.to_image(resolution=300)
im.save('path/to/image.png', format='png')

—will produce a 72 DPI PNG file, not (as you might want) a 300 DPI PNG file.

It's an eminently solvable issue. All you need to do is convert to TIFF rather than PNG (as I probably should have been from the jump), but it might be a good idea to note in the docs that the resolution keyword argument is not respected by Wand if you're trying to make a PNG—especially since the Wand docs are no help.

Again, this might be obvious, but maybe a line wouldn't hurt?

Thanks again for this excellent library!

jsvine commented 1 year ago

Thanks for the report, @jwestwsj! Just to check whether I'm understanding: You're getting the correct number of pixels in the image, but the PNG written by Wand does not contain the necessary metadata indicating that the file should be considered 300 DPI rather than 72 DPI? Or do I misunderstand the situation?

jsvine commented 1 year ago

Just re-upping, and also flagging re. the new arguments (and documentation) in v0.8.0: https://github.com/jsvine/pdfplumber/releases/tag/v0.8.0

jwestwsj commented 1 year ago

Wow, did I leave this hanging—sorry! The issue is that the PNG written by Wand does not respect the resolution keyword argument. No matter what you put there, it produces a 72 DPI PNG file. If you specify a TIFF (or presumably a JPG, though I didn't try), it respects DPI. My suggestion is that we add a line to the .to_image() function docs that indicate that resolution is not respected if you're creating a PNG.

jsvine commented 1 year ago

Thanks for the clarification! What do you think of adding the following note to the documentation?

Note: pdfplumber passes the resolution parameter to Wand, the Python library we use for image conversion. Wand will create the image with the desired number of total pixels of height/width, but does not fully respect the resolution in the strict sense of that word: Although PNGs are capable of storing an image's resolution density as metadata, Wand's PNGs do not.

Or does that still not quite get at what you're noticing?

jwestwsj commented 1 year ago

That's perfect. Thanks!

jsvine commented 1 year ago

Super! Now added in https://github.com/jsvine/pdfplumber/commit/25682fff36e8412b65ab3a86a9cbeabd482c2a7f

jsvine commented 1 year ago

FYI, the DPI-metadata-in-PNG issue should now be fixed in v0.10.0 🎉