On v1.24.2, when I call page.get_text(), the process stuck on very high cpu usage and the method is stuck, not returning or continue the script.
On https://pymupdf.io/ with version 1.23.5, the text can be extracted just fine
How to reproduce the bug
import fitz
# download and save the file to a local storage
file = "C:/Users/DELL/Downloads/sample_scanned/pdf_1978.pdf"
# open the file with fitz
doc = fitz.open(file)
# print the text
for page in doc:
print(page.get_text()) --> **process stuck here**
Tried on both Windows and Linux, and python3.11 docker image
Description of the bug
I tried extracting text from the following file Speculative Investor Behavior in a Stock Market with Heterogeneous Expectations
Edit - attached the file Harrison & Kreps (1978).pdf
On v1.24.2, when I call page.get_text(), the process stuck on very high cpu usage and the method is stuck, not returning or continue the script.
On https://pymupdf.io/ with version 1.23.5, the text can be extracted just fine
How to reproduce the bug
Tried on both Windows and Linux, and python3.11 docker image
PyMuPDF version
1.24.2
Operating system
Linux
Python version
3.11