qurator-spk / sbb_binarization

Document Image Binarization
Apache License 2.0
67 stars 14 forks source link

Numpy issue? #57

Closed kba closed 1 year ago

kba commented 1 year ago

@jbarth-ubhd in https://github.com/OCR-D/ocrd_all/issues/348

Dear reader, tried sbb-textline (recommended on step 1), got this error message:

  File "/usr/local/lib/python3.6/site-packages/sbb_binarize/sbb_binarize.py", line 270, in run
    img_last[:, :][img_last[:, :] > 0] = 255
TypeError: 'int' object is not subscriptable

(almost) complete workflow environment see https://digi.ub.uni-heidelberg.de/diglitData/v/duerer1527_-_aa.zip . Extracts to directory "aa". See run.sh and ocrd.log

ocrd.sif -> ocrd-2022-08-15.sif (not a newer one because OCR-D/ocrd_tesserocr#189 )

@bertsky:

Probably one of the recent Numpy changes.

Which version of numpy and sbb_binarize is this?

@jbarth-ubhd:

singularity exec -e ocrd.sif ocrd-sbb-binarize -V Version 0.0.10, ocrd/core 2.38.0

singularity exec -e ocrd.sif python -c "import numpy as np print(np.version)" 1.19.5

jbarth-ubhd commented 1 year ago

PS: ocrd.sif is from docker 2022-08-15

bertsky commented 1 year ago

@jbarth-ubhd I cannot reproduce with numpy 1.21 up to 1.24. The log archive linked above (aa.zip) yields a 403. Could you please assist?

Otherwise, if this only applied to numpy 1.19 – can we close?

jbarth-ubhd commented 1 year ago

Oops. Wrong permissions. Now downloadable.

jbarth-ubhd commented 1 year ago

Tested with ocrd/all:latest → this bug is fixed.

Now:

  File "/usr/local/lib/python3.8/site-packages/sbb_binarize/sbb_binarize.py", line 40, in __init__
    raise ValueError(f"No models found in {self.model_dir}")
ValueError: No models found in /home/hd/hd_hd/hd_wu120/ocrd_models/sbb-binarize/models/model_bin_sbb_ens.h5
Command exited with non-zero status 1

[hd_wu120@o05i15 aa]$ md5sum /home/hd/hd_hd/hd_wu120/ocrd_models/sbb-binarize/models/model_bin_sbb_ens.h5
bad4dddb8db72bd06fed67c6117e67b7  /home/hd/hd_hd/hd_wu120/ocrd_models/sbb-binarize/models/model_bin_sbb_ens.h5

Ok the md5sum of the model begins with "bad...", this is self-explaining

bertsky commented 1 year ago

Now:

can you please state the command that you have used, ideally also how you deployed your models?

jbarth-ubhd commented 1 year ago

The models from here https://qurator-data.de/sbb_binarization/ do not work ; had to use https://github.com/apacha/sbb_binarization/releases/download/pre-trained-models/model_2020_01_16.zip instead ← ocrd resmgr list-available.

bertsky commented 1 year ago

Better yet: use the models attached to the latest release on the main repo. See also #59.

So everything is working for you now?

jbarth-ubhd commented 1 year ago

Yes, the bug TypeError: 'int' object is not subscriptable is gone.

cneud commented 1 year ago

Should be fine thanks to @bertsky fixes in https://github.com/qurator-spk/sbb_binarization/pull/59 which include the clean SavedModel, closing here.