qurator-spk / eynollah

Document Layout Analysis
Apache License 2.0
328 stars 26 forks source link

contour extraction: inhomogeneous shape #92

Closed bertsky closed 1 year ago

bertsky commented 1 year ago

Running on a longer set of images, eynollah stumbles over:

Traceback (most recent call last):
  File "/local/ocr-d/ocrd_all/venv/bin/ocrd-eynollah-segment", line 8, in <module>
    sys.exit(main())
  File "/local/ocr-d/ocrd_all/venv/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/local/ocr-d/ocrd_all/venv/lib/python3.8/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/local/ocr-d/ocrd_all/venv/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/local/ocr-d/ocrd_all/venv/lib/python3.8/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/local/ocr-d/ocrd_all/venv/lib/python3.8/site-packages/qurator/eynollah/ocrd_cli.py", line 8, in main
    return ocrd_cli_wrap_processor(EynollahProcessor, *args, **kwargs)
  File "/local/ocr-d/ocrd_all/venv/lib/python3.8/site-packages/ocrd/decorators/__init__.py", line 117, in ocrd_cli_wrap_processor
    run_processor(processorClass, ocrd_tool, mets, workspace=workspace, **kwargs)
  File "/local/ocr-d/ocrd_all/venv/lib/python3.8/site-packages/ocrd/processor/helpers.py", line 107, in run_processor
    processor.process()
  File "/local/ocr-d/ocrd_all/venv/lib/python3.8/site-packages/qurator/eynollah/processor.py", line 58, in process
    Eynollah(**eynollah_kwargs).run()
  File "/local/ocr-d/ocrd_all/venv/lib/python3.8/site-packages/qurator/eynollah/eynollah.py", line 2446, in run
    contours_only_text_parent = list(np.array(contours_only_text_parent)[index_con_parents])
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (17,) + inhomogeneous part.

This is eynollah a6fe781033cb749930e949115c5b165f02fdacc0 / Python 3.8 / TF 2.10 / Numpy 1.24.2 / Shapely 2.0.1.

I'll try to figure out some more about the particular input image.

bertsky commented 1 year ago

Simple reason is that Numpy now does not allow this implicit casting anymore. This is what it used to say:

VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray

Obviously, adding dtype=object in all these cases fixes the problem.

Is it ok for me to include the fix in #91?

00sapo commented 10 months ago

Hello, still same issue here:

Traceback (most recent call last):
  File "/home/sapo/develop/AutoDocAugment/.venv/bin/eynollah", line 8, in <module>
    sys.exit(main())
  File "/home/sapo/develop/AutoDocAugment/.venv/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/home/sapo/develop/AutoDocAugment/.venv/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/home/sapo/develop/AutoDocAugment/.venv/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/sapo/develop/AutoDocAugment/.venv/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/home/sapo/develop/AutoDocAugment/.venv/lib/python3.10/site-packages/qurator/eynollah/cli.py", line 193, in main
    eynollah.run()
  File "/home/sapo/develop/AutoDocAugment/.venv/lib/python3.10/site-packages/qurator/eynollah/eynollah.py", line 2904, in run
    contours_only_text_parent = list(np.array(contours_only_text_parent)[index_con_parents])
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (7,) + inhomogeneous part.

but sometimes the error is generated in another (identical) line:

  File "/home/sapo/develop/AutoDocAugment/.venv/lib/python3.10/site-packages/qurator/eynollah/cli.py", line 193, in main
    eynollah.run()
  File "/home/sapo/develop/AutoDocAugment/.venv/lib/python3.10/site-packages/qurator/eynollah/eynollah.py", line 2982, in run
    contours_only_text_parent = list(np.array(contours_only_text_parent)[index_con_parents])
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (9,) + inhomogeneous part.

My environment:

dependencies = [
    "scikit-image[optional]>=0.20.0",
    "requests>=2.31.0",
    "beautifulsoup4>=4.12.2",
    "rich>=13.3.5",
    "toml>=0.10.2",
    "latex @ git+https://github.com/gvasold/latex.git",
    "opencv-python>=4.7.0.72",
    "jinja2>=3.1.2",
    "pymupdf>=1.22.3",
    "augraphy>=8.2.3",
    "requests-cache>=1.0.1",
    "lxml>=4.9.2",
    "numpy>=1.23.5",
    "pytesseract>=0.3.10",
    "tensorflow>=2.4,<2.12", # constraint due to eynollah
    "eynollah>=0.3.0",
]
requires-python = ">=3.10,<3.11"  # constraint due to eynollah

Adding dtype=object as in this solved the issue for me.

vahidrezanezhad commented 10 months ago

Dear @00sapo ,

As you pointed out, this issue had previously been resolved in commit a56988a , but for some reason, it seems to have been overlooked in the most recent version. I have reapplied the commit to address it once more. Thank you