pymupdf / PyMuPDF

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
https://pymupdf.readthedocs.io
GNU Affero General Public License v3.0
5.75k stars 533 forks source link

Segmentation Fault in add_redact_annot #4047

Closed TheLastAurora closed 1 week ago

TheLastAurora commented 1 week ago

Description of the bug

Calling page.add_redact_annot() in a specific PDF page results in segfault (segmentation fault (core dumped)).

How to reproduce the bug

import pymupdf

from pathlib import Path
import os

p = Path("./pdfs").glob("*")
out_dir = Path("./pdfs/out")
os.makedirs(out_dir, exist_ok=True)
files = [x for x in p if x.is_file()]

def replace_table_text(page: pymupdf.Page) -> pymupdf.Page:
    fontname = page.get_fonts()[0][3]
    if fontname not in pymupdf.Base14_fontnames:
        fontname = "Courier"
    hits = page.search_for("|")
    for rect in hits:
        page.add_redact_annot(
            rect, " ", fontname=fontname, align=pymupdf.TEXT_ALIGN_CENTER, fontsize=10
        )  # Segmentation Fault...
    page.apply_redactions()
    return page

doc = pymupdf.Document(files[0])
replace_table_text(doc[0])

file: test-1-24.pdf

PyMuPDF version

1.24.13

Operating system

Linux

Python version

3.12

TheLastAurora commented 1 week ago

I confirm that it works in PyMuPDF==1.24.5 and pymupdf4llm==0.0.16

JorjMcKie commented 1 week ago

What has pymupdf4llm to do with this?

You are aware that you are trying to crack a walnut with a sledgehammer? You just want to remove the "|" characters, right? Then do not try to write a space in the same place - in addition to requesting that that space should be centered in the available area 😂.

But you probably were just kidding and intended to fool the method with an impossible task. So thanks for the opportunity to make the method more watertight!

TheLastAurora commented 1 week ago

What has pymupdf4llm to do with this?

Sorry, I'm using in another part of the code that works fine, so nothing to do with that, just PyMuPDF.

julian-smith-artifex-com commented 1 week ago

Fixed in PyMuPDF-1.24.14.