pymupdf / PyMuPDF

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
https://pymupdf.readthedocs.io
GNU Affero General Public License v3.0
5.16k stars 495 forks source link

page.insert_image wrong point ,Drawing position wrong when using a special pdf document #3288

Closed izerui closed 6 months ago

izerui commented 6 months ago

Description of the bug

Should be the right result like this: WX20240321-144411@2x

but, result is : WX20240321-145204@2x

I think it has something to do with this pdf, because other PDFS are normal,

thinks!!!

How to reproduce the bug

# open the pdf
doc = ...
page = doc[0]
# processor ...
p = [0, 0, 500, 500]
rect = fitz.Rect(float(p[0]), float(p[1]), float(p[2]), float(p[3]))
img_url = 'https://cdn.pixabay.com/photo/2023/11/09/19/36/zoo-8378189_1280.jpg'
response = get_url_file_for_retry(img_url)
img_pixmap = fitz.Pixmap(response.content)
page.insert_image(rect, pixmap=img_pixmap, keep_proportion=False, alpha=0)
# view the result
doc.save(...)
def get_url_file_for_retry(url):
    """
    从url获取文件内容,重试5次
    :param url: 文件的url地址
    :return:
    """
    for _ in range(5):
        try:
            resp = httpx.get(url)
            return resp
        except Exception:
            continue
    raise RuntimeError(f'{url} 文件下载失败')

the pdf is: CS01-P3-003.pdf

PyMuPDF version

1.23.26

Operating system

MacOS

Python version

3.10

izerui commented 6 months ago

This is okay,but it's not a perfect solution.

            try:
                doc = fitz.open('pdf', doc.convert_to_pdf())
            except BaseException as e:
                logger.warn(f'处理注释失败: {repr(e)}')
JorjMcKie commented 6 months ago

You must clean the page obviously before you insert anything. Please read the documentation!

izerui commented 6 months ago

Thank you. That would be nice