qurator-spk / sbb_binarization

Document Image Binarization
Apache License 2.0
67 stars 14 forks source link

Document supported Python versions #39

Closed mikegerber closed 1 year ago

mikegerber commented 2 years ago

sbb_binarization currently needs TensorFlow 2.4, which is not available* for Python 3.10, the default on my Linux installation. Which versions are supported?

mikegerber commented 2 years ago

Ah, I was testing the branch transformer_model_integrationwhich has different requirement compared to the master:

--- a/requirements.txt
+++ b/requirements.txt
@@ -2,4 +2,4 @@ numpy
 setuptools >= 41
 opencv-python-headless
 ocrd >= 2.22.3
-tensorflow >= 2.4.0
+tensorflow == 2.4.*

Something to keep in mind when merging?

mikegerber commented 1 year ago

Using Python 3.10 I get:

% sbb_binarize --model-dir ~/devel/qurator-data/sbb_binarization/2022-08-16/  --patches OCR-D-IMG_00000024.tif OCR-D-IMG_00000024.out.tif                                                                                                
Traceback (most recent call last):
  File "/home/mike/.virtualenvs/sbb_binarization_transformer_model_integration/bin/sbb_binarize", line 33, in <module>
    sys.exit(load_entry_point('sbb-binarization', 'console_scripts', 'sbb_binarize')())
  File "/home/mike/.virtualenvs/sbb_binarization_transformer_model_integration/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/home/mike/.virtualenvs/sbb_binarization_transformer_model_integration/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/home/mike/.virtualenvs/sbb_binarization_transformer_model_integration/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/mike/.virtualenvs/sbb_binarization_transformer_model_integration/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/home/mike/devel/2022-08 eval sbb_binarization_transformer/sbb_binarization/sbb_binarize/cli.py", line 15, in main
    SbbBinarizer(model_dir).run(image_path=input_image, use_patches=patches, save=output_image)
  File "/home/mike/devel/2022-08 eval sbb_binarization_transformer/sbb_binarization/sbb_binarize/sbb_binarize.py", line 98, in __init__
    self.models.append(self.load_model(model_file))
  File "/home/mike/devel/2022-08 eval sbb_binarization_transformer/sbb_binarization/sbb_binarize/sbb_binarize.py", line 117, in load_model
    model = load_model(join(self.model_dir, model_name) , compile=False,custom_objects = {"PatchEncoder": PatchEncoder, "Patches": Patches})
  File "/home/mike/.virtualenvs/sbb_binarization_transformer_model_integration/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/mike/.virtualenvs/sbb_binarization_transformer_model_integration/lib/python3.10/site-packages/keras/utils/generic_utils.py", line 793, in func_load
    code = marshal.loads(raw_code)
ValueError: bad marshal data (unknown type code)
mikegerber commented 1 year ago

I noticed another problem: Using Python 3.6 I get no console output (which is good), when trying to use Python 3.7 I get lots of seemingly useless progress bars like this:

image

bertsky commented 1 year ago

Using Python 3.10 I get:

...
ValueError: bad marshal data (unknown type code)

IIRC that's due to Keras model format using some native Python serialization which is version dependent. If models were converted to TensorFlow SavedModel format – which can be as simple as loading (on the right Python/Keras version) and saving (with the right extension) – then this should be much more interoperable.

bertsky commented 1 year ago

IIRC that's due to Keras model format using some native Python serialization which is version dependent. If models were converted to TensorFlow SavedModel format – which can be as simple as loading (on the right Python/Keras version) and saving (with the right extension) – then this should be much more interoperable.

I just verified that for this package.

  1. on Python 3.6 or 3.7 or 3.8, do:
    from tensorflow.keras.models import load_model
    m = load_model('/path/to/model_bin_sbb_ens.h5', compile=False)
    m.save('/path/to/model_bin_sbb_ens')
  2. patch the H5 loader … https://github.com/qurator-spk/sbb_binarization/blob/f11d0b0bf741253c55930c34e58e7e10718cb652/sbb_binarize/sbb_binarize.py#L36-L38 … roughly like so:

    @@ -35,6 +35,8 @@
    
             self.model_files = glob('%s/*.h5' % self.model_dir)
             if not self.model_files:
    +            self.model_files = glob('%s/*/' % self.model_dir)
    +        if not self.model_files:
                 raise ValueError(f"No models found in {self.model_dir}")
    
             self.models = []
  3. pick the newly saved model directory (in a Python 3.6--3.10 installation of sbb_binarization) and be happy ever after
cneud commented 1 year ago

Yes, the same was also reported (and models converted by @apacha who kindly already did the conversion here.

I have also published the saved_model to the Huggingface hub: https://huggingface.co/SBB/sbb_binarization, but we still have to update the resmgr accordingly.

cneud commented 1 year ago

So wrt to the OP, with https://github.com/qurator-spk/sbb_binarization/pull/59 we currently support Python 3.7-3.10. Will update the Readme accordingly.

cneud commented 1 year ago

included in https://github.com/qurator-spk/sbb_binarization/commit/42bca1441cd9535c7af8b98d62eed550bfe3ddd6