Closed macdeport closed 4 months ago
I could not reproduce this with current versions. I also extracted the JPEG embedded in the output file, and it appears to be well-formed according to the djpeg
application.
Perhaps libjpeg needs to be upgraded on your machine?
Persists in spite of this new configuration:
Python 3.11.9 / ocrmypdf 16.4.3 / pikepdf 9.2.0 / pypdf 4.3.1
jbig2 0.28 / gs 10.03.1 / pngquant 3.0.3
tesseract 5.3.3
leptonica-1.84.1
libgif 5.2.2 : libjpeg 8d (libjpeg-turbo 2.1.5.1) : libpng 1.6.43 : libtiff 4.6.0 : zlib 1.3.1 : libwebp 1.4.0 : libopenjp2 2.5.2
Found NEON
Found libarchive 3.7.4 zlib/1.3.1 liblzma/5.4.6 bz2lib/1.0.8 liblz4/1.9.4 libzstd/1.5.6
Found libcurl/8.9.1 OpenSSL/3.3.1 zlib/1.3.1 brotli/1.1.0 zstd/1.5.6 libidn2/2.3.7 libpsl/0.21.5 nghttp2/1.62.1
python311 @3.11.9_0+lto+optimizations (active)
ocrmypdf @16.4.3_0+python311 (active)
py311-pikepdf @9.2.0_0 (active)
py311-pypdf @4.3.1_0 (active)
---
jbig2dec @0.20_0 (active)
ghostscript @10.03.1_0+x11 (active)
pngquant @3.0.3_0 (active)
tesseract @5.3.3_2 (active)
Still exists with this new configuration ???:
Python 3.11.9 / ocrmypdf 16.5.0 / pikepdf 9.2.1 / pypdf 4.3.1
jbig2 0.28 / gs 10.03.1 / pngquant 3.0.3 / tesseract 5.4.1
leptonica-1.84.1
libgif 5.2.2 : libjpeg 8d (libjpeg-turbo 2.1.5.1) : libpng 1.6.43 : libtiff 4.6.0 : zlib 1.3.1 : libwebp 1.4.0 : libopenjp2 2.5.2
Found NEON
Found libarchive 3.7.4 zlib/1.3.1 liblzma/5.4.6 bz2lib/1.0.8 liblz4/1.9.4 libzstd/1.5.6
Found libcurl/8.9.1 OpenSSL/3.3.1 zlib/1.3.1 brotli/1.1.0 zstd/1.5.6 libidn2/2.3.7 libpsl/0.21.5 nghttp2/1.63.0
python311 @3.11.9_0+lto+optimizations (active)
ocrmypdf @16.5.0_0+python311 (active)
py311-pikepdf @9.2.1_0 (active)
py311-pypdf @4.3.1_0 (active)
---
jbig2dec @0.20_0 (active)
ghostscript @10.03.1_0+x11 (active)
pngquant @3.0.3_0 (active)
tesseract @5.4.1_2 (active)
Describe the bug
Corrupt JPEG data: premature end of data segment
at the end of run with some PDF files. However, the files produced by OCRmyPDF are perfectly usable.Steps to reproduce
Files
bid$.pdf
How did you download and install the software?
PyPI (pip, poetry, pipx, etc.)
OCRmyPDF version
16.1.1
Relevant log output