Open korhun opened 4 years ago
sample.zip If you have a problem reaching the sample tiff file, here is a zip file (it is originally a rar, I've changed the extension name in order to upload :) I don't have a winzip sorry)
Hey, sorry, the title is wrong. It should be not the "Iterator Example" but "GetComponentImages Example".
This function gives exactly same result (words) with the GetText1 function:
` std::string outTextStr;
Pix* image = pixRead(input);
api->SetImage(image);
api->Recognize(0);
tesseract::ResultIterator* ri = api->GetIterator();
tesseract::PageIteratorLevel level = tesseract::RIL_WORD;
if (ri != 0) {
do {
const char* c = ri->GetUTF8Text(level);
if ((c != NULL) && (c[0] != '\0')) {
std::string s = c;
outTextStr += s + " ";
}
delete[] c;
} while (ri->Next(level));
}
pixDestroy(&image);
return outTextStr.c_str();
This is the image
I get the following results. I'm using the latest codes. ForExample: GetText1 finds "EVENING NEWS" as extra; and GetText2 finds "NORAH O’DONNELL" as extra.
Is this normal? Is there a way that I can get all the found words?
GetText1 result: pr vw MD e ir, “a... a > | al \ (spray ga ğa - ef 2 ie ma 04 - ml is ve q . > e > > = | 2 “ 3 YE e eae | 8 o r ii b Ns a Ş = İğ a ay to a . | ’ E He Pe / ene a 200 MILLION AMERICANS IN PATH OF POWERFUL WINTER STORM (EN ©CBS EVENING NEWS A
GetText2 result: e ga)» “©. SATELLİTE- RADAR LOOP & a. Mim e | + a, a a, 200 MILLION AMERICANS IN PATH OF POWERFUL WINTER STORM | ~~ > NIN wt NORAH O’DONNELL
(This attached file is a tiff. I had to change its extension name to jpeg in order to upload.)