qurator-spk / eynollah

Document Layout Analysis
Apache License 2.0
332 stars 27 forks source link

Produces text results #37

Closed mikegerber closed 2 years ago

mikegerber commented 3 years ago

Currently, ocrd-eynollah-segment produces (empty) TextEquiv elements. I believe it should not produce any, as this results in OCR processors giving a lot of warnings:

22:28:22.628 WARNING processor.CalamariRecognize - Line 'region_0081_line_0001' already contained text results
22:28:22.636 WARNING processor.CalamariRecognize - Line 'region_0081_line_0002' already contained text results
22:28:22.665 WARNING processor.CalamariRecognize - Line 'region_0081_line_0003' already contained text results
22:28:22.695 WARNING processor.CalamariRecognize - Line 'region_0081_line_0004' already contained text results
22:28:22.734 WARNING processor.CalamariRecognize - Line 'region_0081_line_0005' already contained text results
mikegerber commented 2 years ago

image

Just so you can understand, why this bothers me: This is the just one page of ouput I get when I run OCR on the files produced by eynollah.

Please don't generate any TextEquiv elements in eynollah's output.