ianzhao05 / textshot

Python tool for grabbing text via screenshot
MIT License
1.73k stars 259 forks source link

Tesseract Process Timeout #23

Closed edwardsaunders7 closed 4 years ago

edwardsaunders7 commented 4 years ago

Appears to not work on more than 5 words at a time, presents with error

"TextShot" "An error occurred when trying to process the image: Tesseract process timeout"

ianzhao05 commented 4 years ago

Could you share the image that caused it to fail? This is a Tesseract issue that is caused by the input image. You can also try increasing the timeout in the code (default 2 seconds).

edwardsaunders7 commented 4 years ago

It doesn't appear to be an input image related issue, I can try it on multiple things and the same errors occur. I've attached some images that I sampled that produced the result

2020-06-27_00-28_1

2020-06-27_00-30

2020-06-27_00-31

ianzhao05 commented 4 years ago

Ok, thanks. Does it work if the text is dark on a light background? If not, maybe the screenshot is taking the wrong image.

edwardsaunders7 commented 4 years ago

Colour doesn't seem to have an effect, just tested with these two inputs: 2020-06-27_00-38_1 2020-06-27_00-38

ianzhao05 commented 4 years ago

Ok, if you add something like pil_img.show() at around line 85, does it show the correct image? By the way, what OS are you on?

edwardsaunders7 commented 4 years ago

Added pil_img.show() - shows the proper image.

Editing the timeout on line 90:

On Manjaro Linux.

Further testing with 60second delay -

Multiple input images of varying text lengths - all complete quickly, without timeout error - perhaps making this a change to the git project would be beneficial?

ianzhao05 commented 4 years ago

The behaviour you are describing is really strange. On my laptop (mid-range specs), it usually takes around a second, even for large amounts of text. If you don’t mind, can you test with the tesseract executable directly and see how long it takes? https://tesseract-ocr.github.io/tessdoc/Command-Line-Usage

edwardsaunders7 commented 4 years ago

Tested with a few input images.

I have added 2 of the images used (the other 2 contain personal information)

Image 3:

Test

Image 4: eurotext

ianzhao05 commented 4 years ago

Thank you for your testing. It seems that Tesseract is taking a lot longer for you. What kind of processor does your computer have? Also, what is your Tesseract version? My PC has an i7-6700K, and my laptop an i5-7200U, and I haven't really had any timeout issues. I do agree that 2 seconds is too low however, so I will update that.

edwardsaunders7 commented 4 years ago

I am running a "4 x Intel Core i5-4690K CPU @ 3.50GHz" with "15.5GiB of RAM"

I tested textshot on my laptop and my desktop previously and had no issues, it only seemed to occur when I reinstalled textshot today. No idea why tesseract is taking longer for me, let me test a few more things and see if I can find an answer.

edwardsaunders7 commented 4 years ago

Update: I just realised I had my VPN on, and so network speeds are likely the cause of the issue

I just disconnected my VPN, and tesseract (and textshot) are both working within a few seconds (using all the same test images)

Apologies for not realising that could be the issue beforehand!

ianzhao05 commented 4 years ago

I honestly didn't know before that Tesseract was affected by network speeds! No apology needed; glad it works now :)