tesseract-ocr / tesseract

Tesseract Open Source OCR Engine (main repository)
https://tesseract-ocr.github.io/
Apache License 2.0
61.31k stars 9.41k forks source link

Openmp cannot be disabled #4251

Closed Kpaybo closed 4 months ago

Kpaybo commented 4 months ago

Current Behavior

Despite putting omp_thread_limit=1 tesseract still use all cores on my machine (i7-7th - windows 10).

We used this (at the beginning of the code) : Environment.SetEnvironmentVariable("OMP_THREAD_LIMIT", "1");

And this ( a batch) : @echo off set OMP_THREAD_LIMIT=1 start "" "path_to_your_application.exe

We have 1 big image Tesseract takes 7 second when we go over it at once.

When we divide the image in 4 and run 4 instances of tesseract in parallel it take 7 second too : no changes at all.

Expected Behavior

1) We should see in the task manager that tesseract only use 1 cores

2) there should be a significant improvement when running 4 images in parallel. Multiple people had success with this method.

Suggested Fix

No response

tesseract -v

No response

Operating System

No response

Other Operating System

No response

uname -a

No response

Compiler

No response

CPU

No response

Virtualization / Containers

No response

Other Information

No response

stweil commented 4 months ago

Please use the Tesseract user forum for questions.

The Tesseract for Windows which is provided by UB Mannheim does not have this issue: it runs always single-threaded because it was built with OpenMP disabled. You did not say what Tesseract binary and which version you used.

Kpaybo commented 4 months ago

The Tesseract for Windows which is provided by UB Mannheim does not have this issue: it runs always single-threaded because it was built with OpenMP disabled. You did not say what Tesseract binary and which version you used.

I will ask the question on the tesseract forum, sorry I thought it came from tesseract.

We are using a .net wrapper of tesseract. It's created by charlessw it's tesseract 5.2.0 (our application that needs tesseract is written in C#).

I don't understand these openmp talks. You're right I saw multiple issue saying that openmp is disabled but at the same time multiple issue say that you should put : OMP_THREAD_LIMIT", "1"

Wouldn't it be useless if it was disabled?

stweil commented 4 months ago

Yes, for builds without OpenMP setting OMP_THREAD_LIMIT has no effect.

Kpaybo commented 4 months ago

I don't understand why we get the same time doing parallelism then.

Our best guess is that it comes from the wrapper. I hope it does. We'll try this front. If it doesn't work I'll come back here asking other questions.

Thanks for your answers.