Closed bbfrog closed 1 month ago
Except for page 7 (0-based), none of the pages contains an image. What you see are vector graphics - no images.
Vector graphics cannot be extracted. All you can do is making a "photo" of the respective page area ...
Acrobat API can extract the vector graphics and save as png or svg. How does it do this? Is it hard to support in Pymupdf? THanks!
You can try this script. Or do this:
import pymupdf
doc = pymupdf.open("input.pdf")
for page in doc:
for i, bbox in enumerate(page.cluster_drawings()):
pix = page.get_pixmap(clip=bbox, dpi=150)
pix.save(f"{doc.name}-{page.number}-{i}.png")
Thanks @JorjMcKie very much. It works and can extract the image I want. But it also extracted tables from this pdf as drawing, is there any field can differentiate the tables with other drawing? Thanks!
Description of the bug
Monaleesa_full.pdf Pymupdf can't extract images in page 2 and page 4 of this pdf.
How to reproduce the bug
import pymupdf doc = pymupdf.open('Monaleesa_full.pdf')
page_num = 0 for page in doc: page_num += 1 images = page.get_images(full=True)
print(f'page {page_num}: {len(images)} images')
PyMuPDF version
1.24.11
Operating system
MacOS
Python version
3.12