tesseract-ocr / tesseract

Tesseract Open Source OCR Engine (main repository)
https://tesseract-ocr.github.io/
Apache License 2.0
61.72k stars 9.45k forks source link

Enabling openmp leads to 10x performance regression on mingw-w64 #2035

Closed jeroen closed 5 years ago

jeroen commented 5 years ago

Environment

Problem

Updated the R bindings from Tesseract 3 to 4 on Windows and MacOS. However on Windows I noticed an enormous performance regression. Basic OCR examples that took 1sec before now took over 10 sec.

I noticed that the default was now to build with OpenMP which was not the case in Tesseract 3. I was able to solve the problem by rebuilding with --disable-openmp. Perhaps that should be the default on Windows

zdenop commented 5 years ago

Your approach is wrong: if something does not work for you it should be turn off?

Please try to search before you post issue. Public testing of version 4 last 2 years some there must be more info experiences that yours... e.g: https://github.com/tesseract-ocr/tesseract/issues/1662#issuecomment-401349871 https://github.com/tesseract-ocr/tesseract/issues/1171#issuecomment-337452383 https://github.com/tesseract-ocr/tesseract/issues/1081

amitdo commented 5 years ago

Missing info:

There is a tesseract binary for MingW-w64, prepared by @stweil. https://github.com/UB-Mannheim/tesseract/wiki

The old engine is still available in 4.0.0.

jeroen commented 5 years ago

We're building it native on Windows as part of msys2: https://github.com/Alexpux/MINGW-packages/blob/master/mingw-w64-tesseract-ocr/PKGBUILD (with autotools).

This example used the stock eng and osd training data from tessdata_fast. The same performance differences between T3 and T4 appear on all hardware I have tested (I'm running on 8x 2.7 GHz Intel Core i7 myself).

jeroen commented 5 years ago

Is there a benchmarking C++ program that I can link with my build to diagnose the performance?