yusanshi / pdf-image-binarization

Binarize all images in a scanned PDF file. 扫描版 PDF 黑白化/二值化。
7 stars 1 forks source link

Error: decoder libtiff not available #1

Closed brhr-iwao closed 2 years ago

brhr-iwao commented 2 years ago

Hello, I tried to convert a scanned pdf with pdf_image_binarization on Cygwin (CYGWIN_NT-10.0-22000 3.3.5-341.x86_64) and Windows 11 Home (Version 21H2) with opencv-3.4.1, poppler-21.01.0-1, libtiff-4.2.0 and python-3.9.1 (python packeges:img2pdf-0.4.4, libtiff-0.4.2).

I have the following error:

$ python3 binarize.py
Begin converting for **.pdf.
Extracting images from **.pdf
===========================================================
Christian Wolf, LIRIS Laboratory, Lyon, France.
christian.wolf@liris.cnrs.fr
Version 2.4 (August 1st, 2014)
===========================================================
Adaptive binarization
Threshold calculation: Wolf and Jolion (2001)
parameter k=0.3
Input size: 2319x3000
Setting window size to [40,40].
surface created
Writing binarized image to file 'temp/converted/image-1.tif'.
===========================================================
Christian Wolf, LIRIS Laboratory, Lyon, France.
christian.wolf@liris.cnrs.fr
Version 2.4 (August 1st, 2014)
===========================================================
Adaptive binarization
Threshold calculation: Wolf and Jolion (2001)
parameter k=0.3
Input size: 2319x3000
Setting window size to [40,40].
surface created
Writing binarized image to file 'temp/converted/image-2.tif'.
===========================================================
Christian Wolf, LIRIS Laboratory, Lyon, France.
christian.wolf@liris.cnrs.fr
Version 2.4 (August 1st, 2014)
===========================================================
Adaptive binarization
Threshold calculation: Wolf and Jolion (2001)
parameter k=0.3
Input size: 2319x3000
Setting window size to [40,40].
surface created
Writing binarized image to file 'temp/converted/image-3.tif'.
===========================================================
Christian Wolf, LIRIS Laboratory, Lyon, France.
christian.wolf@liris.cnrs.fr
Version 2.4 (August 1st, 2014)
===========================================================
Adaptive binarization
Threshold calculation: Wolf and Jolion (2001)
parameter k=0.3
Input size: 2319x3000
Setting window size to [40,40].
surface created
Writing binarized image to file 'temp/converted/image-4.tif'.
Combining images ...
error: decoder libtiff not available
Finish converting for **.pdf.

The following test script seems to work fine, I guess pylibtiff works.

# simple_test.py
from libtiff import TIFF
file = 'filename.tif'
img = TIFF.open(file, mode='r').read_image()
print(img.dtype)
print(img.size)

$ python simple_test.py
uint8
270000

What would you think of?

yusanshi commented 2 years ago

Looks like the issue is with the img2pdf package. I searched in its issues and only find this.

I don't use Windows so I cannot give some useful ideas. But anyway, I think you can:

  1. Change the code to use PNG or JPEG as the auxiliary images file instead of TIFF.
  2. Use WSL on Windows.
brhr-iwao commented 2 years ago

I editted binarize.py as follows:

- subprocess.run(['pdftoppm', '-tiff', '-tiffcompression',
-                 'lzw', '-scale-to', '3000', pdf_path, os.path.join(image_dir_path, 'image')])
+ subprocess.run(['pdftoppm', '-png', '-tiffcompression',
+                 'lzw', '-scale-to', '3000', pdf_path, os.path.join(image_dir_path, 'image')])

- def images_to_pdf(image_dir_path, pdf_path, extension='*.tif'):
+ def images_to_pdf(image_dir_path, pdf_path, extension='*.png'):

This works fine ! Thank you so much for your suggestion !