Closed KaKi87 closed 4 months ago
Closing as not a bug. Tesseract.js returns an empty string when no text is detected, so the fact that it does not throw an error is an intended behavior.
The fact that no text is returned for this particular image is also not a bug, as this appears to be a CAPTCHA, and therefore was specifically designed to not be recognizable by Tesseract (and similar programs).
Tesseract.js returns an empty string when no text is detected, so the fact that it does not throw an error is an intended behavior.
Well, then it would be nice to mention this in the API documentation.
That said, I don't feel it makes sense to return success on failure 🤔
In general, runtime errors should only be thrown when a program fails to run to completion. If Tesseract recognition fails to run (return code 1
) an error will be thrown. If Tesseract runs and exits successfully (return code 0
), that will not throw an error, even if the results happen to be incorrect (which Tesseract has no way of knowing). Furthermore, there is no reason to assume finding no text on a page is incorrect. The single most common use of OCR is document scanning, and documents frequently contain pages with no text.
The single most common use of OCR is document scanning, and documents frequently contain pages with no text.
I see.
Still, I wouldn't have created this issue if this was mentioned in the API documentation, especially considering that I've already successfully used this lib for solving captchas from different sources.
Thanks
Okay, I've added a warning to api.md that states that exceptions are not thrown when no text is detected.
Thanks !
Tesseract.js version (version number for npm/GitHub release, or specific commit for repo) 5.0.4
Describe the bug
To Reproduce Steps to reproduce the behavior:
Tesseract.recognize
data.text
contains''
and no error is thrownPlease attach any input image required to replicate this behavior.![](https://github.com/naptha/tesseract.js/assets/21284089/5a70ef3e-d673-4e95-ad25-5d899f400a54)
Expected behavior
data.text
contains the content of the image.Device Version:
Additional context None
Thanks