tesseract-ocr / tesseract

Tesseract Open Source OCR Engine (main repository)
https://tesseract-ocr.github.io/
Apache License 2.0
59.53k stars 9.23k forks source link

Potential Null Pointer Dereference in Function `RecodeBeamSearch::ContinueContext` #4247

Closed hribz closed 1 month ago

hribz commented 1 month ago

Current Behavior

In the Function RecodeBeamSearch::ContinueContext, If the condition on line 906 is false, then the previous=previous->prev statement at the end of each iteration of the for loop will lead to a null pointer dereference.

https://github.com/tesseract-ocr/tesseract/blob/5d5a633a5d7abfb155a605be90f8033f82e9744f/src/lstm/recodebeam.cpp#L901-L910

Suggested Fix

If previous could be nullptr, an error handling branch should be added, as shown below:

if (previous != nullptr) {
    ...
} else {
    // Add error handling code here
}

If previous cannot be nullptr, maybe can remove the check for previous, as shown below:

while (previous->duplicate || previous->code == null_char_) {
  previous = previous->prev;
}
prefix.Set(p, previous->code);
full_code.Set(p, previous->code);
stweil commented 1 month ago

I never had a NULL pointer deference in this function and never saw a bug report which reported one. Therefore I think the checks should be removed.

Do you want to send a pull request?

hribz commented 1 month ago

I never had a NULL pointer deference in this function and never saw a bug report which reported one. Therefore I think the checks should be removed.

Do you want to send a pull request?

Yeah, I have send a pull request.

egorpugin commented 1 month ago

From the name previous it can be nullptr.

hribz commented 1 month ago

From the name previous it can be nullptr.

But if the for loop is entered and previous is nullptr, it will inevitably cause a null pointer dereference. Currently, it seems there is no path where previous would be nullptr when entering the for loop. Or do you think an error handling branch should be added?

stweil commented 1 month ago

From the name previous it can be nullptr.

Yes, but obviously the loop always terminates before the nullptr is reached. Otherwise we'd have lots of Tesseract crashes.

egorpugin commented 1 month ago

From for (int p = length - 1; p >= 0; --p, previous = previous->prev) { to for (int p = length - 1; p >= 0 && previous ; --p, previous = previous->prev) {

and remove checks from inside the loop.

stweil commented 1 month ago

@egorpugin, would you prefer the nullptr check in the for statement although that case never occurred up to now?

stweil commented 1 month ago

I just did a test with make check and found that the body of the for loop is never executed because length is always 0.

egorpugin commented 1 month ago

@egorpugin, would you prefer the nullptr check in the for statement although that case never occurred up to now?

Yes.

And more than that is a question about this issue at all. 'Potential' dereference - potential not an issue?

Just do a quick refactor of cond inside for loop and that's enough. You even discovered that the loop is not executed at all, so don't touch it or change semantics without knowing what it does or what it is for.

So, checking it for nullptr in the for statement LGTM.

stweil commented 1 month ago

I updated the PR. Please review.