Closed Samll-Kosmos closed 4 years ago
@Samll-Kosmos Hi, pytesseract itself supports timeout argument. Please checkout the documentation for examples and more info. When the timeout is reached, pytesseract should kill the related tesseract process. Keep in mind that this method is not graceful and you should not rely on getting result, when timeout is reached.
Thanks for the quick reply.
Somehow using timeout argument helps to solve my problem but it is not optimal. I was looking for a way to kill the process at an arbitrary point in time (in my case, as soon as the web server reaches its configured timeout). But I think this is not possible unless the extraction function (e.g. image_to_data) gives some information about the process being executed.
A possible solution to this is to pass a mutable object to image_to_data which retrieves information about the process. Something like this.
process_info = {}
image_to_data(image, lang, config, nice, output_type, timeout, pandas_config, process_info)
Then, process_info can contain a field called PID with the PID of the tesseract process as soon as the tesseract process starts. But this doesn't sounds like a good solution.
Python is a very dynamic language, so If you want you can redefine the pytesseract.pytesseract.timeout_manager
.
And you can replace it with your custom version that can report the PIDs to you.
You can even reuse the timeout argument for a bidirectional communication.
Thank you for your idea. I'm gonna implement it like that. Maybe post here a snippet to show others my solution.
Hi,
I'm currently using Gunicorn WSGI for a web service. In the service I use pytesseract to process and extract information from some documents. I have configured Gunicorn with a timeout. I have the problem that when the timeout is reached and the connection is closed the tesseract process is still running in the background. Is there a way to manually kill this tesseract process?
Many thanks