pymupdf / PyMuPDF

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
https://pymupdf.readthedocs.io
GNU Affero General Public License v3.0
4.52k stars 446 forks source link

SegFault 11 when empty H1 H2 H3 H4 etc element is used in insert_htmlbox #3559

Closed cs-shadowbq closed 1 week ago

cs-shadowbq commented 3 weeks ago

Description of the bug

SegFault 11 when empty HTML elements H1 H2 H3 H4 are used in insert_htmlbox.

How to reproduce the bug

Assume

import pymupdf
doc = pymupdf.Document()
page = doc.new_page()

This Fails

>>> text_insert="""<body><h3></h3></body>"""
>>> page.insert_htmlbox(fullrect, text_insert)
Segmentation fault: 11

This passes

>>> text_insert="""<body><h3>&nbsp;</h3></body>"""
>>> page.insert_htmlbox(fullrect, text_insert)
(701.0720024108887, 1.0)

Other html elements do NOT have this failure such as (div, span,b,center..)

PyMuPDF version

1.24.3

Operating system

MacOS

Python version

3.8

JorjMcKie commented 3 weeks ago

Confirmed to also happen on Windows.

cs-shadowbq commented 3 weeks ago

upstream 👀 - https://github.com/ArtifexSoftware/mupdf/commit/b084345acbbef5de2fafd310278e8f73d9269c2b

julian-smith-artifex-com commented 1 week ago

Fixed in 1.24.6.