pymupdf / PyMuPDF

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
https://pymupdf.readthedocs.io
GNU Affero General Public License v3.0
5.17k stars 495 forks source link

The image generated by get_pixmap() is abnormal #3853

Closed 1339503169 closed 3 weeks ago

1339503169 commented 3 weeks ago

Description of the bug

here is original pdf error.pdf

image generated by get_pixmap() error

what looks like in wps image

When I use a file viewer such as WPS to view this file, it is normal, but the images generated by get_pixmap() are very strange, and the results obtained by get_text() are problematic

How to reproduce the bug

import fitz original_pdf = "path/to/pdf" doc = fitz.open(original_pdf) page = doc.load_page(0) image = page.get_pixmap()

PyMuPDF version

1.24.5

Operating system

Windows

Python version

3.8

JorjMcKie commented 3 weeks ago

This PDF contains severe errors which prevent any meaningful processing.