pymupdf / PyMuPDF

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
https://pymupdf.readthedocs.io
GNU Affero General Public License v3.0
5.19k stars 496 forks source link

There is an issue with the image generated by the page.get_pixmap() function #2964

Closed 1339503169 closed 6 months ago

1339503169 commented 9 months ago

Description of the bug

img_test.pdf The image converted through the page.get_pixmap() function has characters that were not originally present in the PDF. The source file has characters that appear to be 'From (Shipper) 发件人', but the actual image displayed does not match the PDF. The converted image is like this, with the red box indicating the error. You can compare it with img_test. pdf for comparison

image

How to reproduce the bug

here is the code i used to generate image

''' import fitz document = fitz.open('./data/img_test.pdf') page = document.load_page(0) rotate = int(0) zoom_x, zoom_y = 2, 2 trans = fitz.Matrix(zoom_x, zoom_y).prerotate(rotate) pix = page.get_pixmap(matrix=trans, alpha=False) pix.save('data/img_test.png') ''' what should I do to get the correct picture

PyMuPDF version

1.23.7 or earlier

Operating system

Windows

Python version

3.8

JorjMcKie commented 9 months ago

Submitted bug report in https://bugs.ghostscript.com/show_bug.cgi?id=707451.

cbm755 commented 9 months ago

Just FYI, that file renders incorrectly in Evince on Fedora GNU/Linux (which is completely independent of PyMuPDF).

image

JorjMcKie commented 9 months ago

Just FYI, that file renders incorrectly in Evince on Fedora GNU/Linux (which is completely independent of PyMuPDF).

Thanks for this Colin. Yeah, maybe there is a general issue with these files. I am sure we will soon here from our friends at MuPDF.

robinwatts commented 8 months ago

The file does indeed look broken. We have a fix in 1.24 that improves it.

The text now says "1 Front(Shipper)", albeit with dodgy spacing.

Essentially, it's a broken file, and we're doing as well with it as we can.

The commit in question is:

https://git.ghostscript.com/?p=mupdf.git;a=commitdiff;h=0a5b60420

I'll see about pulling this back to 1.23.x so you can get access to it soon.

JorjMcKie commented 8 months ago

The MuPDF team has developed a fix that will at least improve the rendering of this type of pages.

julian-smith-artifex-com commented 6 months ago

Fixed in 1.24.0.