rmtheis / tess-two

Fork of Tesseract Tools for Android
Apache License 2.0
3.76k stars 1.38k forks source link

english the api works well but if i use the arabic trained data the app crashes #174

Closed yatharthgupta112 closed 8 years ago

yatharthgupta112 commented 8 years ago

09-17 15:20:02.050 21768-21778/com.example.sigmaway.homeimage W/art: Suspending all threads took: 28.488ms 09-17 15:20:02.078 21768-25485/com.example.sigmaway.homeimage V/OCR: Ctesseract 1 09-17 15:20:02.085 21768-25485/com.example.sigmaway.homeimage W/linker: /data/app/com.example.sigmaway.homeimage-1/lib/arm64/libjpgt.so: unused DT entry: type 0x6ffffffe arg 0x29b0 09-17 15:20:02.085 21768-25485/com.example.sigmaway.homeimage W/linker: /data/app/com.example.sigmaway.homeimage-1/lib/arm64/libjpgt.so: unused DT entry: type 0x6fffffff arg 0x1 09-17 15:20:02.088 21768-25485/com.example.sigmaway.homeimage W/linker: /data/app/com.example.sigmaway.homeimage-1/lib/arm64/libpngt.so: unused DT entry: type 0x6ffffffe arg 0x58e0 09-17 15:20:02.088 21768-25485/com.example.sigmaway.homeimage W/linker: /data/app/com.example.sigmaway.homeimage-1/lib/arm64/libpngt.so: unused DT entry: type 0x6fffffff arg 0x2 09-17 15:20:02.093 21768-25485/com.example.sigmaway.homeimage W/linker: /data/app/com.example.sigmaway.homeimage-1/lib/arm64/liblept.so: unused DT entry: type 0x6ffffffe arg 0x231d0 09-17 15:20:02.093 21768-25485/com.example.sigmaway.homeimage W/linker: /data/app/com.example.sigmaway.homeimage-1/lib/arm64/liblept.so: unused DT entry: type 0x6fffffff arg 0x2 09-17 15:20:02.097 21768-25485/com.example.sigmaway.homeimage W/linker: /data/app/com.example.sigmaway.homeimage-1/lib/arm64/libtess.so: unused DT entry: type 0x6ffffffe arg 0x67f60 09-17 15:20:02.097 21768-25485/com.example.sigmaway.homeimage W/linker: /data/app/com.example.sigmaway.homeimage-1/lib/arm64/libtess.so: unused DT entry: type 0x6fffffff arg 0x3 09-17 15:20:02.156 21768-25485/com.example.sigmaway.homeimage V/OCR: Ctesseract 2 09-17 15:20:02.157 21768-25485/com.example.sigmaway.homeimage V/OCR: Ctesseract 3 09-17 15:20:02.293 21768-25485/com.example.sigmaway.homeimage A/libc: Fatal signal 6 (SIGABRT), code -6 in tid 25485 (AsyncTask #4) 09-17 15:20:03.802 27329-27329/com.example.sigmaway.homeimage W/art: Before Android 4.1, method android.graphics.PorterDuffColorFilter android.support.graphics.drawable.VectorDrawableCompat.updateTintFilter(android.graphics.PorterDuffColorFilter, android.content.res.ColorStateList, android.graphics.PorterDuff$Mode) would have incorrectly overridden the package-private method in android.graphics.drawable.Drawable 09-17 15:20:04.033 27329-27329/com.example.sigmaway.homeimage A/add home: tess data or Document file found 09-17 15:20:04.037 27329-27329/com.example.sigmaway.homeimage A/add home: tess data or Document file found 09-17 15:20:04.090 27329-27372/com.example.sigmaway.homeimage D/OpenGLRenderer: Use EGL_SWAP_BEHAVIOR_PRESERVED: true 09-17 15:20:04.099 27329-27329/com.example.sigmaway.homeimage D/Atlas: Validating map...

public class Ocr { String TAG= "OCR"; String DATA_PATH = Environment.getExternalStorageDirectory().toString() + "/Sigmaway/"; String[] language={"eng","ara"}; Context c; ArrayList Pics=new ArrayList(); public void Ocr(Context context){

this.c=context;
String[] paths = new String[]
        { DATA_PATH, DATA_PATH + "tessdata/" };

for (String path : paths) {
    File dir = new File(path);
    if (!dir.exists()) {
        if (!dir.mkdirs()) {
            Log.v(TAG, "ERROR: Creation of directory " + path + " on sdcard failed");
            return;
        } else {
            Log.v(TAG, "Created directory " + path + " on sdcard");
        }
    }

}
for (String lang:language)
{   Log.v(TAG, "hey c");

    if (!(new File(DATA_PATH + "tessdata/" + lang + ".traineddata")).exists()) {
        try {

            AssetManager assetManager = c.getAssets();
            InputStream in = assetManager.open("tessdata/" + lang + ".traineddata");
            //GZIPInputStream gin = new GZIPInputStream(in);
            OutputStream out = new FileOutputStream(DATA_PATH
                    + "tessdata/" + lang + ".traineddata");

            // Transfer bytes from in to out
            byte[] buf = new byte[1024];
            int len;
            //while ((lenf = gin.read(buff)) > 0) {
            while ((len = in.read(buf)) > 0) {
                out.write(buf, 0, len);
            }
            in.close();
            //gin.close();
            out.close();

            Log.v(TAG, "Copied " + lang + " traineddata");
        } catch (IOException e) {
            Log.e(TAG, "Was unable to copy " + lang + " traineddata " + e.toString());
        }
    }

}

}

public String tesseract(Context context,Bitmap bmpImg, String lang){ this.c=context;

Log.v(TAG, "Ctesseract 1" ); TessBaseAPI baseApi = new TessBaseAPI(); Log.v(TAG, "Ctesseract 2" ); baseApi.setDebug(true); Log.v(TAG, "Ctesseract 3" ); baseApi.init(DATA_PATH,lang); Log.v(TAG, "Ctesseract 4" ); baseApi.setImage(bmpImg); Log.v(TAG, "Ctesseract 5 " ); String recognizedText = baseApi.getUTF8Text(); Log.v(TAG, "Ctesseract 6" ); baseApi.end(); if ( lang.equalsIgnoreCase("eng") ) { recognizedText = recognizedText.replaceAll("[^a-zA-Z0-9]+", " "); }

//recognizedText = recognizedText.trim(); return recognizedText; }

} This is my class through which i ocr the task and call the method in async task from the main activity. used the trained data provided in your documentation and compiled the dependencies in gradle with command compile 'com.rmtheis:tess-two:6.0.4' so if i do use english the api works well but if i use the arabic trained data the app crashes giving the below error on baseApi.init(DATA_PATH,lang); command 09-17 15:20:02.293 21768-25485/com.example.sigmaway.homeimage A/libc: Fatal signal 6 (SIGABRT), code -6 in tid 25485 (AsyncTask #4) 09-17 15:20:03.802 27329-27329/com.example.sigmaway.homeimage W/art: Before Android 4.1, method android.graphics.PorterDuffColorFilter android.support.graphics.drawable.VectorDrawableCompat.updateTintFilter(android.graphics.PorterDuffColorFilter, android.content.res.ColorStateList, android.graphics.PorterDuff$Mode) would have incorrectly overridden the package-private method in android.graphics.drawable.Drawable

rmtheis commented 8 years ago

Use Cube: baseApi.init(DATA_PATH, lang, OEM_CUBE_ONLY);

yatharthgupta112 commented 8 years ago

shall i use cube traineddata for it then?

yatharthgupta112 commented 8 years ago

Btw i tried it on normal ara.traineddata is not working. And i Tried making my own trained data for ara that data is working. i think this trained data has some issue baseApi.init(DATA_PATH,"ara",TessBaseAPI.OEM_CUBE_ONLY);

rmtheis commented 8 years ago

Arabic OCR is working fine for me when I run the test cases with the Cube trained data for Arabic and OEM_CUBE_ONLY.

Please reopen with the minimal code needed to reproduce the issue and the image file or test case you're using.

See also tesseract-ocr/tesseract#428.

yatharthgupta112 commented 8 years ago

Sir which one is the cube trained data file for arabic ?

rmtheis commented 8 years ago

@yatharthgupta112 There are several: ara.cube.*

They all need to be stored together in your data directory.

yatharthgupta112 commented 8 years ago

@rmtheis the ara.cube* data files worked but ara.traineddata file didn't worked. But the result i am getting after ocr using cube data has only 20% accuracy or may be less. So can you help or suggest me how to improve the arabic ocr result. And thank you so much sir for your help.

AbdelsalamHaa commented 6 years ago

Hi, im using tesseract 4.00 and leptonica 1.75.3 i used eng.traindata for some image and it worked very well . now im trying to use the same code for arabic i used ara.traindata but it gives a weird characters. is it due to getUTF8Text(); or that has nothing to do with it

this is the same part of the code for the english one . the only difference is that i change the eng.traindata to ara.triandata. the image is in textImg variable.

ic.SetImage((uchar*)textImg.data, textImg.size().width, textImg.size().height, textImg.channels(), textImg.step1()); result = ic.GetUTF8Text(); ic.Clear();

ibrahimAlii commented 6 years ago

@AbdelsalamHaa @yatharthgupta112 Did you find the solution for bad accuracy ?

@rmtheis Please help