Open eyalroz opened 2 years ago
See https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html. All work on Tesseract is currently done by volunteers, so you are invited to find the answers to your questions and document them.
@stweil : Can you linkify the "100 languages" sentence in the README.md to point to that page?
@eyalroz I went ahead and propsed the change in the tesseract repo: https://github.com/tesseract-ocr/tesseract/pull/4027
I also think it would be very helpful. Even though the list itself has no information on languages in v5 yet.
Even though the list itself has no information on languages in v5 yet.
There was no update for v5. All the v4 data files should work with Tesseract 5.x.
There was no update for v5. All the v4 data files should work with Tesseract 5.x.
That's at least not obvious from the table.
The information can be found in other parts of the docs, true. Users can easily miss it though.
Language model traineddata files same as listed above for version 4.0.0 can be used with Tesseract 5.x.x.
The README.md says tesseract "supports over 100 languages out of the box". But - which languages? And what quality is the support for different languages known to be, out of the box?
It would be helpful if a separate file (or wiki page) would detail, to the extent possible, this information.