tesseract-ocr / tesseract

Tesseract Open Source OCR Engine (main repository)
https://tesseract-ocr.github.io/
Apache License 2.0
62.07k stars 9.5k forks source link

[RFC] The legacy engine - should we keep it or drop it? #4342

Open amitdo opened 2 hours ago

amitdo commented 2 hours ago

We talked about this topic years ago.

@stweil, do you still think we should keep it?

stweil commented 2 hours ago

We just added a legacy model to tessdata_contrib ...

And yes, there is still no alternative to get character attributes.

amitdo commented 1 hour ago

Regardibg font attributes, they are not so reliable.

amitdo commented 1 hour ago

https://github.com/tesseract-ocr/tesseract/issues/433