Closed HarshitD closed 6 years ago
Hmm, can you try again using tess-two version 8.0.0? Hindi is working OK for me in both Tesseract and Cube modes on version 8.0.0.
Thanks for the reply. I tried with version 8.0.0 but still same issue.
In the build.gradle file of app, I changed the version as
compile
'com.rmtheis:tess-two:8.0.0'
I am directly using this code : https://github.com/imperialsoup/SimpleTesseractExample
Is there some modification to be done in this code to make it work for Hindi?
I have hin.traineddata file along with all .cube files under app>assets>tessdata folder.
Could you describe how to make it work for Hindi language?
Thanks in advance!!
What's the error message that's printed to the device log when your app crashes?
Here is the error summary displayed on my android mobile. Screenshot 1 Screenshot 2 According to what I have found is, I think the problem is - for Hindi, I have to use .cube files as well because Tesseract 3 requires .cube files and tess-two works on Tesseract 3. And I am not able to figure out how to use these .cube files. Simply putting .cube files in the folder with hin.traineddata file doesn't work.
Thanks for your help.
I can't reproduce the error that you're seeing. Make sure you're using the correct training data file, from the 3.04.00 tag of the tessdata project.
I get the following result for your input image when using the default settings (OEM_TESSERACT_ONLY
and PageSegMode.PSM_SINGLE_BLOCK
):
राहुल ने तंज कसते हुए कहा कि कि स'घ का उद्देश्य महिलाओं कं! असशक्त करना है. आरएसएस मैं महिलाओं की कोई जगह नहीं है. यथा कांई जानता हैं कि कोई महिला २55 से संबंधित हो और नेतृत्व कर रही हो माल अगर साप महात्मा गांधी की तस्वीर देखेंगे तो उनके दाई और बाई और महिलाओं कं! पाएंगे, मार आप मोहन भागवत की तस्वीर देखेंगे तो या तो दो अकेले होंगे या फिर पुरुषों से घिरे होंगे
राहुल गांधी ने कहा कि अगा हम अंदर की रस्ता में आते है तो हम जीएत्तटी की संरचना में बदलाव लाएंगै और इसे काफी सरल बनाएंगे. उन्होंने कहा कि कांम्रेरर में सबसे अह्म रूप से इस बात का संतुलन रखा गया है कि महिला और पुरुषों की संख्या मैं ज्यादा अंतर नहीं अम मैं मेघालय में पाती की महिलाओं की आमंहिरत करना चाहूगा कि दो पार्टी मैं शामिल हाँ त्ताब्सि हमारे षाटींमें अधिक से अधिक महिलाएं चुनी जा सकें और उन्हें नौका मिलरस्के.
Thanks for your reply. However, I still could not resolve the error. I have tried with training data file from here. This page also says that "For Arabic and Hindi you need both the traineddata file and the cube data files." I have searched on internet, many people faced similar problem to mine that the app crashes for Hindi and Arab, but nowhere I found an answer. The closest I found said to include cube data files in the same folder as training data file, but that also doesn't help. Could you please tell me how did you make it run for Hindi?
Thanks a lot for your help.
Yes, you need to install hin.* from https://github.com/tesseract-ocr/tessdata/tree/3.04.00
Thanks for reporting this issue. I've created a task (#240) for myself to improve the training data checking for Arabic and Hindi so developers get a clear error message rather than a crash when using the wrong training data files.
Thanks for your reply. I installed all hin.* files from the link provided by you but the app still crashes. Could you tell how you made it work for Hindi or share the relevant code?
Thanks for your help.
The problem is solved. Thanks for your help.
The problem was in TessBaseAPI.init()
As I am new to it, I couldn't understand it earlier. After implementing OEM_TESSERACT_ONLY
, it worked,
Thanks a lot for your help.
Glad you were able to solve the problem!
Thanks for looking into this issue. After taking a second look at this, I want to make a note here for reference.
Arabic and Hindi OCR requires the installation of all Cube data files when using OEM_DEFAULT
.
Hindi OCR also works using OEM_TESSERACT_ONLY
when the hin.traineddata
file is installed, and Hindi also works using OEM_CUBE_ONLY
or OEM_TESSERACT_CUBE_COMBINED
when the Cube data files are additionally installed.
I am trying to build android app for OCR Hindi using tess two. It runs for many languages except Hindi. For Hindi, the app just crashes when try to scan any hindi language. I tried all OEM_TESSERACT_ONLY, OEM_TESSERACT_CUBE_COMBINED, OEM_CUBE_ONLY and PSM_SINGLE_BLOCK but app not working. Please give any solution .
Crash: java.lang.IllegalArgumentException: Cube data files not found. See https://github.com/rmtheis/tess-two/issues/239 at com.googlecode.tesseract.android.TessBaseAPI.init(TessBaseAPI.java:347) at com.googlecode.tesseract.android.TessBaseAPI.init(TessBaseAPI.java:303) at com.ashomok.tesseractsample.MainActivity.extractText(MainActivity.java:352)
I include ara.cube.* and user OEM_TESSERACT_ONLY , app still crash
I include ara.cube.* and user OEM_TESSERACT_ONLY , app still crash
I also use OEM_CUBE_ONLY
Summary: I am new to tesseract and Android Studio. I am trying to build android app for OCR using tess two. I was able to make it with the help of internet and it runs for many languages except Hindi. For Hindi, the app just crashes after opening it.
Expected result: Hindi language should also work along with all other languages.
Actual result: The app crashes when I put hin.traineddata file and change the language to Hindi.
Tess-two version: tess-two:5.4.1
Android version: 7.1.2
Phone/device model: Xiaomi Redmi 4
Phone/device architecture (armeabi, armeabi-v7a, x86, mips, arm64-v8a, x86_64, mips64):
Link to training data used: https://github.com/tesseract-ocr/tessdata/tree/3.04.00
Link to image used as input: