mittagessen / kraken

OCR engine for all the languages
http://kraken.re
Apache License 2.0
750 stars 131 forks source link

"Object" level error should not fail without `--raise-on-error` on segmentation #443

Closed PonteIneptique closed 1 year ago

PonteIneptique commented 1 year ago

My current understanding of --raise-on-error is that it should act as a method for deeply "debugging" a set of document or Kraken directly. Unfortunately, the blla.segment has some raising at the line level which prevent the full document to be processed:

https://github.com/mittagessen/kraken/blob/89adb865a419c87d3b17b75c799f5b8ad76aa6ca/kraken/lib/segmentation.py#L729-L732

I proposed a first potential patch in #442 but if I were to propose a "better" patch, I would probably add a raise_on_error parameter to blla.segment and segmentation.calculate_polygonal_environment to decide between logging and raising at line 730.

I'll propose a second patch, so that you don't have to code it yourself, and choose the version you prefer.

mittagessen commented 1 year ago

Thanks, I merged them.