Open oanamocean opened 4 years ago
You're initializing the API for each request which probably adds a significant overhead. Try initializing a pool of PyTessBaseAPI
instances and use them in each thread and see if that improves the run time.
Also, because of GIL I recommend using multiprocessing instead of multithreading.
For the details it depends on whether you want to do batch processing (like on a bunch of files) or on demand processing (like in a server). For the former, see this example, for the latter, I recommend something based on mp.Queue
and mp.Process
.
Hey, I have an API using this code to predict text from different images but I'm having trouble understanding why the performance is so bad when I'm running multiple requests in parallel.
If I run one request the time is around 2 seconds but if I start running 10 requests at the same time it gets to 40 seconds. I've read a lot about how to optimise and get better times, set different Tesseract variables and configurations but still couldn't find a solution for this. I've also set OMP_THREAD_LIMIT to 1 but it's not enough.
Any ideas about this?