msys2 / MINGW-packages

Package scripts for MinGW-w64 targets to build under MSYS2.
https://packages.msys2.org
BSD 3-Clause "New" or "Revised" License
2.25k stars 1.21k forks source link

[tesseract-ocr] tesseract-ocr extremely slow #6255

Open lattice0 opened 4 years ago

lattice0 commented 4 years ago

https://packages.msys2.org/package/mingw-w64-x86_64-tesseract-ocr?repo=mingw64

this package gives extremely slow OCR.

I'm testing this code https://github.com/cppan/tesseract_example/blob/master/with_cmake/src/main.cpp which is a basic example of tesseract.

As you can see: https://github.com/cppan/tesseract_example/blob/master/phototest.tif the image that comes with it is very very simple.

However my computer takes 10 seconds to recognize this image's text using this source code. Both debug and release takes the same ammount of time.

It's on a Windows VM with 4 cores, not the fastest machine but decent, can play youtube normally, etc.

I compiled the same example on Linux with packages from ubuntu repository and it takes like 0.3 seconds to recognize the same image on the same example.

Who compiled this library? Did he/she turn on the optimizations?

dolang commented 4 years ago

I've stumbled on another problem with tesseract (#6382), and have been investigating for a bit on that matter.

While at it, I've found some mentions on problems with OpenMP on MinGW. According to the release notes, this is now disabled by default, but maybe it isn't in your Linux version, which makes it quite a bit faster?

See here: https://github.com/tesseract-ocr/tesseract/issues/2035 https://github.com/tesseract-ocr/tesseract/issues/1662#issuecomment-401349871