Open nullpointersetc opened 2 years ago
Could you please provide an image which triggers this division by zero?
It does not make sense to simply add a check for the division. First we have to analyse why this function is called with endpt_.y() == startpt_.y()
(so it is a point, not a vector).
Could you please provide an image which triggers this division by zero?
It does not make sense to simply add a check for the division. First we have to analyse why this function is called with
endpt_.y() == startpt_.y()
(so it is a point, not a vector).
In case of a point the length should be 1.
I currently don't have an image that I can give you.
@nullpointersetc, maybe some part of an image which can be published is sufficient to trigger the issue, or you can send me a confidential image per e-mail. I am afraid that we have to close the issue without a fix if there is no test case.
I don't know how to construct such an image.
For example, if I try to construct such an image with this text:
I get back that the image is 2548 x 3298 (the original image was 1019 x 1319 at 120 DPI, so that may be explained)
TabVector::Evaluate is called only four times for this image. At the if statement I indicated, the values are:
startpt={xcoord=234, ycoord=971}, endpt={xcoord=234, ycoord=3026}, and prev_good_box={bot_left={xcoord=237 ycoord=2994 } top_right={xcoord=272 ycoord=3026 } }
startpt={xcoord=2270 ycoord=961 }, endpt={xcoord=2270 ycoord=3016 }, and prev_good_box = {bot_left={xcoord=2147 ycoord=2994 } top_right={xcoord=2167 ycoord=3016 } }
startpt = {xcoord=234 ycoord=971 }, endpt = {xcoord=234 ycoord=3026 }, and prev_good_box={bot_left={xcoord=237 ycoord=2994 } top_right={xcoord=272 ycoord=3026 } }
startpt = {xcoord=2270 ycoord=961 }, endpt = {xcoord=2270 ycoord=3016 }, and prev_good_box={bot_left={xcoord=2147 ycoord=2994 } top_right={xcoord=2167 ycoord=3016 } }
I DO NOT know how to interpret these numbers. I would have assumed that these are number of pixels from the top-left of the image, but the startpt and endpt all seem to refer to a vertical region of the screen that's one pixel wide and consist of only white pixels, while the good boxes appear to be all white pixels. Am I going along the right path in trying to come up with an image?
Did you try to use version 5.1.0 with the same image?
In the beginning of this method length == 0
is checked as part of a condition.
I don't expect that 5.1.0 or our latest code fixed this issue. @nullpointersetc, it would really help if you could provide an image which triggers the bug. You can send it to my personal e-mail address, and I will keep it private.
@nullpointersetc, it would also be interesting whether the same bug also occurs on Linux or MacOS. Could you please test it (that's also possible on Windows with WSL)?
Environment
Current Behavior:
On a certain image, an integer division-by-zero exception occurs and the OCR program using Tesseract as a library is terminated.
We have determined that the problem is in method TabVector::Evaluate in src/textord/tabvector.cpp, and specifically in this section of the code:
There is no validation before the assignment to percentscore that
length
is not zero (i.e., that endpt.y() does not equal startpt.y()).Expected Behavior:
The integer division is not attempted and the process does not abort.
Suggested Fix: