OCR fails if image is rotated

robertknight / tesseract-wasm

JS/WebAssembly build of the Tesseract OCR engine for use in browsers and Node

https://robertknight.github.io/tesseract-wasm/

BSD 2-Clause "Simplified" License

260 stars 27 forks source link

OCR fails if image is rotated #29

Open robertknight opened 2 years ago

robertknight commented 2 years ago

OCR completely fails if the image is rotated at 90, 180 or 270 degrees. Tesseract has built-in orientation detection, so this could be used to resolve that.

robertknight commented 2 years ago

Tesseract's built-in orientation detection requires the library to be build with the legacy / non-LSTM text recognition engine. Leptonica has some built-in orientation detection functionality. So some options:

Compile Tesseract with the legacy engine included, so its orientation detection can be used. This increases the WASM binary size from 1.6 => 2.3MB in my testing.
Use Leptonica's orientation detection
Don't support orientation detection and leave it as a problem for the consumer

robertknight commented 2 years ago

Work in progress at https://github.com/robertknight/tesseract-wasm/pull/34.

robertknight commented 2 years ago

https://github.com/robertknight/tesseract-wasm/pull/34 adds a partial solution in the form of orientation detection, however the algorithm is simplistic and this means that in any application user input would probably be required to confirm actions depending on it.

robertknight commented 2 years ago

I posted a comment on Hacker News and someone responded with a test case where the word recognition works well, but the text is not output in the correct order, due rotation of the image:

C1jn2Kz

If you compare the text output of this image in the demo, vs a copy of this image rotated such that the text baselines are straight, you can see that the layout outputs are different.

johanvaneck commented 6 months ago

Any updates on this?

robertknight commented 6 months ago

No. Ensuring the input is correctly oriented is currently a problem that users of the library have to solve.