qurator-spk / sbb_binarization

Document Image Binarization
Apache License 2.0
69 stars 14 forks source link

Cannot load models in qurator-data git-annex #7

Closed mikegerber closed 3 years ago

mikegerber commented 3 years ago
$ ocrd-sbb-binarize --overwrite -I OCR-D-IMG -O OCR-D-IMG-BIN -P model /var/lib/sbb_binarization
18:35:13.783 INFO processor.SbbBinarize - INPUT FILE 0 / PHYS_0024
18:35:13.787 INFO processor.SbbBinarize - Binarizing on 'page' level in page 'PHYS_0024'
/var/lib/sbb_binarization/.gitkeep
Traceback (most recent call last):
  File "/usr/local/bin/ocrd-sbb-binarize", line 8, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/sbb_binarize/ocrd_cli.py", line 115, in cli
    return ocrd_cli_wrap_processor(SbbBinarizeProcessor, *args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/ocrd/decorators/__init__.py", line 81, in ocrd_cli_wrap_processor
    run_processor(processorClass, ocrd_tool, mets, workspace=workspace, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/ocrd/processor/helpers.py", line 69, in run_processor
    processor.process()
  File "/usr/local/lib/python3.6/dist-packages/sbb_binarize/ocrd_cli.py", line 66, in process
    bin_image = cv2pil(binarizer.run(image=pil2cv(page_image), use_patches=True))
  File "/usr/local/lib/python3.6/dist-packages/sbb_binarize/sbb_binarize.py", line 199, in run
    res = self.predict(model_in, image, use_patches)
  File "/usr/local/lib/python3.6/dist-packages/sbb_binarize/sbb_binarize.py", line 47, in predict
    model, model_height, model_width, n_classes = self.load_model(model_name)
  File "/usr/local/lib/python3.6/dist-packages/sbb_binarize/sbb_binarize.py", line 40, in load_model
    model = load_model(join(self.model_dir, model_name), compile=False)
  File "/usr/local/lib/python3.6/dist-packages/keras/engine/saving.py", line 492, in load_wrapper
    return load_function(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/keras/engine/saving.py", line 583, in load_model
    with H5Dict(filepath, mode='r') as h5dict:
  File "/usr/local/lib/python3.6/dist-packages/keras/utils/io_utils.py", line 191, in __init__
    self.data = h5py.File(path, mode=mode)
  File "/usr/local/lib/python3.6/dist-packages/h5py/_hl/files.py", line 408, in __init__
    swmr=swmr)
  File "/usr/local/lib/python3.6/dist-packages/h5py/_hl/files.py", line 173, in make_fid
    fid = h5f.open(name, flags, fapl=fapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 88, in h5py.h5f.open
OSError: Unable to open file (file signature not found)

The directory /var/lib/sbb_binarization is a copy of sbb_binarization/ in our private qurator-data git-annex, which happens to include a file .gitkeep - which the current code tries to load as a HDF5 file.

mikegerber commented 3 years ago

I deleted .gitkeep, but still think it should only try to load *.h5.

kba commented 3 years ago

The code only does os.listdir, should glob *.h5. I'll send a PR.