pymupdf / PyMuPDF

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
https://pymupdf.readthedocs.io
GNU Affero General Public License v3.0
5.33k stars 509 forks source link

font.valid_codepoints() - malfunction #3933

Open pranctco opened 1 week ago

pranctco commented 1 week ago

Description of the bug

font.valid_codepoints() has stopped working correctly on the latest version.

How to reproduce the bug

code + sample pdf

font_valid_codepoints.zip

latest version - malfunction

> python font_valid_codepoints.py test-doc-2.pdf

PyMuPDF 1.24.11: Python bindings for the MuPDF 1.24.10 library (rebased implementation).
Python 3.11 running on win32 (64-bit).

len(page.get_fonts())=9

Font BCDEEE+Calibri DOES NOT SUPPORT ANY SYMBOLS
Font BCDFEE+San-Regu DOES NOT SUPPORT ANY SYMBOLS
Font BCDGEE+San-Ital DOES NOT SUPPORT ANY SYMBOLS
Font BCDHEE+San-Bold DOES NOT SUPPORT ANY SYMBOLS
Font BCDIEE+San-Regu DOES NOT SUPPORT ANY SYMBOLS
Font BCDJEE+Calibri DOES NOT SUPPORT ANY SYMBOLS

Done

previous version - correct result

> python font_valid_codepoints.py test-doc-2.pdf

PyMuPDF 1.23.4: Python bindings for the MuPDF 1.23.2 library.
Version date: 2023-09-26 00:00:01.
Built for Python 3.11 on win32 (64-bit).

len(page.get_fonts())=9

Font BCDEEE+Calibri has 36 supported symbols
Font BCDFEE+San-Regu has 53 supported symbols
Font BCDGEE+San-Ital has 20 supported symbols
Font BCDHEE+San-Bold has 20 supported symbols
Font BCDIEE+San-Regu has 53 supported symbols
Font BCDJEE+Calibri has 36 supported symbols

Done

PyMuPDF version

1.24.11

Operating system

Windows

Python version

3.11

julian-smith-artifex-com commented 4 days ago

Thanks for reporting this. We know how to fix the problem, but it requires a change to MuPDF itself, so it might not make it into the next PyMuPDF release.

I'll update this issue once MuPDF has been updated.

julian-smith-artifex-com commented 7 hours ago

The latest MuPDF master branch has a fix for this.

However today's release of PyMuPDF-1.24.12 does not use latest MuPDF master, so does not fix this issue.

The next release will probably will have the fix, but this isn't guaranteed.

[Building current PyMuPDF with current MuPDF master fixes the problem.]