tesseract-ocr / tesseract

Tesseract Open Source OCR Engine (main repository)
https://tesseract-ocr.github.io/
Apache License 2.0
62.23k stars 9.51k forks source link

tesseract ocr big size pic dump #3885

Open wuyang-dl opened 2 years ago

wuyang-dl commented 2 years ago

hi, void TessBaseAPI::SetImage(Pix *pix) API function has a coredump problem when handling a big size pic(system memory no enough)

void TessBaseAPI::SetImage(Pix pix) { if (InternalSetImage()) { if (pixGetSpp(pix) == 4 && pixGetInputFormat(pix) == IFF_PNG) { // remove alpha channel from png Pix p1 = pixRemoveAlpha(pix); pixSetSpp(p1, 3); (void)pixCopy(pix, p1); <---- bug pixDestroy(&p1); } thresholder->SetImage(pix); SetInputImage(thresholder->GetPixRect()); } }

pixCopy(pix, p1) function in leptonica, return pixd, or NULL on error so it is necessary to check pixCopy return val.

Environment

Current Behavior:

tesseract dump

Expected Behavior:

tesseract ocr ok(not dump)

Suggested Fix:

Possible fix: void TessBaseAPI::SetImage(Pix pix) { if (InternalSetImage()) { if (pixGetSpp(pix) == 4 && pixGetInputFormat(pix) == IFF_PNG) { // remove alpha channel from png Pix p1 = pixRemoveAlpha(pix); pixSetSpp(p1, 3);

  // fix-begin
   if ( pixCopy(pix, p1) == NULL) {
      pixDestroy(&p1);
      recognition_done_ = false;  //maybe
      return ;
  }
 // fix-end

  pixDestroy(&p1);
}
thresholder_->SetImage(pix);
SetInputImage(thresholder_->GetPixRect());

} }

tks

amitdo commented 1 year ago

@stweil,

IMO, we should undo https://github.com/tesseract-ocr/tesseract/commit/57b79742920c

stweil commented 1 year ago

That would not fix the issue here which is caused by missing error handling.