tesseract-ocr / tesseract

Tesseract Open Source OCR Engine (main repository)
https://tesseract-ocr.github.io/
Apache License 2.0
61.88k stars 9.47k forks source link

Look only for digits not working #3094

Open illera88 opened 4 years ago

illera88 commented 4 years ago

Environment

    tessaretAPI = std::make_shared<tesseract::TessBaseAPI>();

    tessaretAPI->SetPageSegMode(tesseract::PageSegMode::PSM_SINGLE_LINE);
    tessaretAPI->SetVariable("tessedit_char_blacklist", "\n!?@#$%&*()<>_-+=/:;'\"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz");
    tessaretAPI->SetVariable("tessedit_char_whitelist", "0123456789");
    tessaretAPI->SetVariable("classify_bln_numeric_mode", "1");

    if (tessaretAPI->Init(".", "eng", tesseract::OEM_DEFAULT)) {
        fprintf(stderr, "Could not initialize tesseract.\n");
        // throw error
        exit(1);
    }
    tessaretAPI->SetPageSegMode(tesseract::PageSegMode::PSM_SINGLE_LINE);
    tessaretAPI->SetVariable("tessedit_char_blacklist", "!?@#$%&*()<>_-+=/:;'\"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz");
    tessaretAPI->SetVariable("tessedit_char_whitelist", "0123456789");
    tessaretAPI->SetVariable("classify_bln_numeric_mode", "1");

// Later

std::string OpenCV_sensor::detect_text(const cv::Mat& img)
{
    tessaretAPI->SetImage((uchar*)img.data, img.size().width, img.size().height, img.channels(), (int)img.step1());
    return tessaretAPI->GetUTF8Text();
}

I'm trying just to detect numbers with tesseract but I'm still getting other than just numeric results.

I've tried to set the restrictions to tessaret before and after calling Init but the results still are alphanumeric.

Moldoteck commented 4 years ago

Check the return of SetVariable. Is it true or false?

GerfriedC commented 4 years ago

Is there a possibility to include a lean unpkg .js only focusing on digit recognition, correcting glare and odd horizon angles?