rmtheis / tess-two

Fork of Tesseract Tools for Android
Apache License 2.0
3.76k stars 1.38k forks source link

Could not initialize Tesseract API with language=ind! #235

Closed anta40 closed 6 years ago

anta40 commented 6 years ago

I'm working on an Android app which is able to scan Indonesian ID card. I took ind.traineddata and put in into my internal storage: /storage/emulated/0/MyOCR/tessdata/ind.traineddata_. My device is Galaxy Note 4, running Android 6.0.1

Relevant codes:

String DATA_PATH = Environment.getExternalStorageDirectory().toString() + "/MyOCR/";
String lang = "ind";

TessBaseAPI baseApi = new TessBaseAPI();
baseApi.setDebug(true);
baseApi.init(DATA_PATH, lang);
baseApi.setImage(bitmap);
String recognizedText = baseApi.getUTF8Text();
baseApi.end();

Because my OS is Android M, some additional permission requests have to be done. In AndroidManifest.xml, I have:

<uses-permission android:name="android.permission.READ_EXTERNAL_STORAGE" />
<uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE" />

and in my activity, I have:

protected boolean shouldAskPermissions() {
    return (Build.VERSION.SDK_INT > Build.VERSION_CODES.LOLLIPOP_MR1);
}

@TargetApi(23)
protected void askPermissions() {
    String[] permissions = {
        "android.permission.READ_EXTERNAL_STORAGE",
        "android.permission.WRITE_EXTERNAL_STORAGE"
    };
    int requestCode = 200;
    requestPermissions(permissions, requestCode);
}

@Override
protected void onCreate(Bundle savedInstanceState) {
    super.onCreate(savedInstanceState);
    setContentView(R.layout.activity_main);

    if (shouldAskPermissions()){
        askPermissions();
    }
}

My tess-two version is 8.0.0. When running the app, it crashes. I found something interesting in debugging log:

01-23 16:19:04.749 7691-7691/app.devbyzero.net.myocr V/MyOCR: Orient: 6 01-23 16:19:04.749 7691-7691/app.devbyzero.net.myocr V/MyOCR: Rotation: 90 01-23 16:19:04.774 7691-7691/app.devbyzero.net.myocr V/MyOCR: Before baseApi 01-23 16:19:06.869 7691-7691/app.devbyzero.net.myocr E/Tesseract(native): Could not initialize Tesseract API with language=ind!

So this means somehow the Tesseract API cannot read the provided training data?

rmtheis commented 6 years ago

Hmm, looks like you're using the v4 training data but you should be using the v3.04 training data here: https://github.com/tesseract-ocr/tessdata/blob/3.04.00/ind.traineddata