rmtheis / tess-two

Fork of Tesseract Tools for Android
Apache License 2.0
3.76k stars 1.38k forks source link

Tesseract 4.0 Support? #196

Closed seantibb closed 5 years ago

seantibb commented 7 years ago

First, I love tess-two...really :). I was just reading through the tesseract-ocr wiki (https://github.com/tesseract-ocr/tesseract/wiki/4.0-Accuracy-and-Performance) and noticed there are some major performance gains with 4.0. Is there anything I can do to help update tess-two to support 4.0 as well?

Thanks!

rmtheis commented 7 years ago

Thanks. I definitely want to update to support Tesseract 4.0 for the reasons you point to. I'll need help to do it for sure, and I appreciate all the contributions from you and everyone else!

There are two things that contributors can help with right now that will help toward supporting Tesseract 4:

  1. Investigation of #197. This bug needs more info as to:

The crash is reproducible on emulators, so having a 64-bit device isn't a requirement for looking into this.

  1. Many of the changes in Tesseract 3.05 are back-ports of Tesseract 4 code, so tess-two support for Tesseract 3.05 will be a step in the right direction toward supporting Tesseract 4. When I have a chance I plan to upload a branch that's a work in progress for supporting Tesseract 3.05. I'll be needing some help getting that branch working. I plan to update this issue when I upload that branch.
rmtheis commented 7 years ago

Update: I've pushed code to the master branch that runs Tesseract 3.05.00. The problems I had been having with an earlier version of the Tesseract code have been resolved. I plan to make a release on Bintray/JCenter with these new changes soon.

rmtheis commented 7 years ago

Update: The Tesseract 3.05.00 code has been released in tess-two 6.3.0.

I have pushed a branch called tesseract4 that's a work in progress for Tesseract 4.0. It builds, but it's not working as of right now.

jasonwedepohl commented 7 years ago

Tesseract 4.0's LSTM is "much more memory-intensive" according to the doc on accuracy and performance. I can't find the specs of the test machine, but is possible that the memory constraints of most mobile devices will slow down the engine. I did read somewhere that the plan is to mark the original Tesseract engine as obsolete, so I hope that LSTM can really perform better on devices with 1 to 2 GB of RAM.

rmtheis commented 7 years ago
06-20 20:54:51.936 1354-1354/? A/DEBUG: *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
06-20 20:54:51.936 1354-1354/? A/DEBUG: Build fingerprint: 'Android/sdk_google_phone_x86/generic_x86:6.0/MASTER/3738108:userdebug/test-keys'
06-20 20:54:51.936 1354-1354/? A/DEBUG: Revision: '0'
06-20 20:54:51.936 1354-1354/? A/DEBUG: ABI: 'x86'
06-20 20:54:51.936 1354-1354/? A/DEBUG: pid: 3428, tid: 3441, name: ationTestRunner  >>> com.googlecode.tesseract.android.test <<<
06-20 20:54:51.936 1354-1354/? A/DEBUG: signal 6 (SIGABRT), code -6 (SI_TKILL), fault addr --------
06-20 20:54:51.938 1354-1354/? A/DEBUG:     eax 00000000  ebx 00000d64  ecx 00000d71  edx 00000006
06-20 20:54:51.938 1354-1354/? A/DEBUG:     esi ae7b9980  edi 00000002
06-20 20:54:51.938 1354-1354/? A/DEBUG:     xcs 00000073  xds 0000007b  xes 0000007b  xfs 0000004f  xss 0000007b
06-20 20:54:51.938 1354-1354/? A/DEBUG:     eip b72dbf26  ebp 00000d71  esp ae7b83e0  flags 00200202
06-20 20:54:51.951 1354-1354/? A/DEBUG: backtrace:
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #00 pc 00083f26  /system/lib/libc.so (tgkill+22)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #01 pc 000815f8  /system/lib/libc.so (pthread_kill+70)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #02 pc 00027205  /system/lib/libc.so (raise+36)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #03 pc 000209e4  /system/lib/libc.so (abort+80)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #04 pc 0012b127  /data/app/com.googlecode.tesseract.android.test-1/lib/x86/libtess.so (ERRCODE::error(char const*, TessErrorLogCode, char const*, ...) const+263)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #05 pc 000fdb6a  /data/app/com.googlecode.tesseract.android.test-1/lib/x86/libtess.so (tesseract::ImageData::PreScale(int, int, float*, int*, int*, GenericVector<TBOX>*) const+138)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #06 pc 0018dba6  /data/app/com.googlecode.tesseract.android.test-1/lib/x86/libtess.so (tesseract::Input::PrepareLSTMInputs(tesseract::ImageData const&, tesseract::Network const*, int, tesseract::TRand*, float*)+70)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #07 pc 00195edb  /data/app/com.googlecode.tesseract.android.test-1/lib/x86/libtess.so (tesseract::LSTMRecognizer::RecognizeLine(tesseract::ImageData const&, bool, bool, bool, float, float*, tesseract::NetworkIO*, tesseract::NetworkIO*)+155)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #08 pc 00195621  /data/app/com.googlecode.tesseract.android.test-1/lib/x86/libtess.so (tesseract::LSTMRecognizer::RecognizeLine(tesseract::ImageData const&, bool, bool, double, bool, UNICHARSET const*, TBOX const&, float, bool, tesseract::PointerVector<WERD_RES>*)+705)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #09 pc 000b852a  /data/app/com.googlecode.tesseract.android.test-1/lib/x86/libtess.so (tesseract::Tesseract::LSTMRecognizeWord(BLOCK const&, ROW*, WERD_RES*, tesseract::PointerVector<WERD_RES>*)+426)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #10 pc 000a0f45  /data/app/com.googlecode.tesseract.android.test-1/lib/x86/libtess.so (tesseract::Tesseract::classify_word_pass1(tesseract::WordData const&, WERD_RES**, tesseract::PointerVector<WERD_RES>*)+117)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #11 pc 0009df0a  /data/app/com.googlecode.tesseract.android.test-1/lib/x86/libtess.so (tesseract::Tesseract::RetryWithLanguage(tesseract::WordData const&, void (tesseract::Tesseract::*)(tesseract::WordData const&, WERD_RES**, tesseract::PointerVector<WERD_RES>*), bool, WERD_RES**, tesseract::PointerVector<WERD_RES>*)+170)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #12 pc 00098935  /data/app/com.googlecode.tesseract.android.test-1/lib/x86/libtess.so (tesseract::Tesseract::classify_word_and_language(int, PAGE_RES_IT*, tesseract::WordData*)+453)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #13 pc 00099666  /data/app/com.googlecode.tesseract.android.test-1/lib/x86/libtess.so (tesseract::Tesseract::RecogAllWordsPassN(int, ETEXT_DESC*, PAGE_RES_IT*, GenericVector<tesseract::WordData>*)+774)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #14 pc 0009ac80  /data/app/com.googlecode.tesseract.android.test-1/lib/x86/libtess.so (tesseract::Tesseract::recog_all_words(PAGE_RES*, ETEXT_DESC*, TBOX const*, char const*, int)+464)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #15 pc 0008544a  /data/app/com.googlecode.tesseract.android.test-1/lib/x86/libtess.so (tesseract::TessBaseAPI::Recognize(ETEXT_DESC*)+890)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #16 pc 00083d56  /data/app/com.googlecode.tesseract.android.test-1/lib/x86/libtess.so (tesseract::TessBaseAPI::GetUTF8Text()+70)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #17 pc 0026ba0d  /data/app/com.googlecode.tesseract.android.test-1/lib/x86/libtess.so (Java_com_googlecode_tesseract_android_TessBaseAPI_nativeGetUTF8Text+77)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #18 pc 00022d2c  /data/app/com.googlecode.tesseract.android.test-1/oat/x86/base.odex (offset 0x12000) (java.lang.String com.googlecode.tesseract.android.TessBaseAPI.nativeGetUTF8Text(long)+128)
06-20 20:54:51.953 1354-1354/? A/DEBUG:     #19 pc 00024fb5  /data/app/com.googlecode.tesseract.android.test-1/oat/x86/base.odex (offset 0x12000) (java.lang.String com.googlecode.tesseract.android.TessBaseAPI.getUTF8Text()+185)
06-20 20:54:51.953 1354-1354/? A/DEBUG:     #20 pc 000171b6  /data/app/com.googlecode.tesseract.android.test.test-1/oat/x86/base.odex (offset 0xe000) (void com.googlecode.tesseract.android.test.TessBaseAPITest.testChoiceIterator()+378)
06-20 20:54:51.953 1354-1354/? A/DEBUG:     #21 pc 00137a82  /system/lib/libart.so (art_quick_invoke_stub+338)
06-20 20:54:51.953 1354-1354/? A/DEBUG:     #22 pc 001435c4  /system/lib/libart.so (art::ArtMethod::Invoke(art::Thread*, unsigned int*, unsigned int, art::JValue*, char const*)+212)
06-20 20:54:51.953 1354-1354/? A/DEBUG:     #23 pc 0050f858  /system/lib/libart.so (art::InvokeMethod(art::ScopedObjectAccessAlreadyRunnable const&, _jobject*, _jobject*, _jobject*, unsigned int)+1736)
06-20 20:54:51.953 1354-1354/? A/DEBUG:     #24 pc 0048c5e3  /system/lib/libart.so (art::Method_invoke(_JNIEnv*, _jobject*, _jobject*, _jobject*)+80)
06-20 20:54:51.953 1354-1354/? A/DEBUG:     #25 pc 72a3aca4  /data/dalvik-cache/x86/system@framework@boot.oat (offset 0x1eb2000)
06-20 20:54:51.999 1354-1354/? A/DEBUG: Tombstone written to: /data/tombstones/tombstone_00
kirantpatil commented 7 years ago

Hi All,

Any updates on this issue ?

kirantpatil commented 7 years ago

Can we use tess-two with Tesseract 4.0 ?

amin1985 commented 7 years ago

in tesseract 4 dotproductsse.cpp , dotproductavx.cpp https://github.com/tesseract-ocr/tesseract/blob/197b89b6ac8ca61c0feeb88479cecea6600b8733/arch/dotproductavx.cpp fprintf(stderr, "DotProductAVX can't be used on Android\n");

it mentioned that "AVX" and "SSE" can't be used on Android what is avx? Intel® Advanced Vector Extensions (Intel® AVX) has been extended to support 256-bit instruction size on 64-bit processors so its hardware based cpu architect and intel patent that available on intel and AMD CPUs https://en.wikipedia.org/wiki/Advanced_Vector_Extensions

as i know none of android CPUs support it (Intel doesn't let) it means lstm will not be available on android and if it does, it will be available only on new devices or optimized lstm ocr released for android(if possible)

or maybe i am wrong about this post?!

Wikinaut commented 7 years ago

Have you tried to compile and build the recent Tesseract 4.0 https://github.com/tesseract-ocr/tesseract version?

ruthloeser commented 6 years ago

Hi, I am trying to compile and run Tesseract 4, I get I/Tesseract(native): Initialized Tesseract API with language=eng A/libc: Fatal signal 6 (SIGABRT), code -6 in tid 23391 (le.tess_two_app) Any idea what causing this error

rezaee commented 6 years ago

When will tess-two support tesseract4?

magamine commented 6 years ago

any news for tesseract 4 ?

avielas commented 6 years ago

what about tesseract 4 ?

nirajan-pant commented 6 years ago

I found Tesseract OCR iOS is a Framework for iOS7+, compiled also for armv7s and arm64. Update tesseract version to 4.00.00alpha at https://github.com/chaoskyme/Tesseract-OCR-iOS

Will this help to figure out the compile issues for Android?

avielas commented 6 years ago

Sounds interesting but I think it doesn’t help because the major challenge is the JNI interface which exists just on OCR Android

rezaee commented 6 years ago

Dear @rmtheis , Thanks for your great work. But me as a mid-level or beginner programmer, don't know exactly how can I help to porting tesseract 4 on Android. Maybe if you could explain more in details or break the project down into some small projects, we can help you to do it sooner.

ghost commented 6 years ago

I like to contribute too, but this is my first time and first post here and don't know how can I do that?

AbdelsalamHaa commented 6 years ago

did any of u guys could use tess two with tesseract 4 so far or not ?? is there any way to get tesseract 4.0 to work with andriod ?? Thank you so much.

ghost commented 6 years ago

Maybe the owner is left the project?

hejin commented 6 years ago

Hi guys, I thought we may have asked for too much for the project contributors.

LSTM/RNN inference performance & resource optimization in mobile/embed platforms is not just a piece of cake as supposed.

for guys wish to contribute, my suggestion is to get the latest stable release (tesseract v3.0.5) to run with pure JNI/c++ code in android firstly. This project(code) by @rmtheis and other guys has already provided enough HOWTO information. They have no duty to answer all the questions since it's OPEN SOURCE project !!! Let's appreciate the great work by these guys @rmtheis et al.

rmtheis commented 6 years ago

I don't know when I'll have time to work on updating this project to use the Tesseract 4 beta. If anyone wants to take this task on, please have at it!

One smaller (but still pretty big) task that would help toward that effort would be to make a pull request that gets Travis CI working on this project. What I have in mind is a Travis configuration that builds the project and then runs the instrumented tests on emulators for armv7, armv8, x86, and x86-64.

avielas commented 6 years ago

I checkout tesseract4 branch (from tess-two repository) and succeed to run './gradlew assemble' with tests passed accuracy of 89%. Can I use tess-two now with tesseract4 support? If not, what should I do more to get this support on my android app? Actually I run also my application tests with the compiled tess-two (tesseract4 brach) but I get exactly the same results (as master branch)

avielas commented 6 years ago

@rmtheis can you please answer my question?

hadar-ayoub commented 6 years ago

Hi,

I think we need a list of remaining tasks to integrate completely the tesseract 4 on this library.

@rmtheis What can i do to contribute on it?

Regards, Ayoub

rmtheis commented 6 years ago

Currently the tesseract4 branch builds successfully with NDK r16b, and the legacy OEM mode 0 works, but but I'm seeing the following crash when running with v4 training data and the LSTM OEM mode 1:

2018-08-12 13:46:25.744 8943-8943/? A/DEBUG: *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
2018-08-12 13:46:25.744 8943-8943/? A/DEBUG: Build fingerprint: 'google/sdk_gphone_x86/generic_x86:9/PPP4.180612.007/4860066:userdebug/dev-keys'
2018-08-12 13:46:25.744 8943-8943/? A/DEBUG: Revision: '0'
2018-08-12 13:46:25.744 8943-8943/? A/DEBUG: ABI: 'x86'
2018-08-12 13:46:25.744 8943-8943/? A/DEBUG: pid: 8924, tid: 8940, name: ationTestRunner  >>> com.googlecode.tesseract.android.test <<<
2018-08-12 13:46:25.745 8943-8943/? A/DEBUG: signal 6 (SIGABRT), code -6 (SI_TKILL), fault addr --------
2018-08-12 13:46:25.745 8943-8943/? A/DEBUG:     eax 00000000  ebx 000022dc  ecx 000022ec  edx 00000006
2018-08-12 13:46:25.745 8943-8943/? A/DEBUG:     edi 000022dc  esi d597f1b0
2018-08-12 13:46:25.746 8943-8943/? A/DEBUG:     ebp 00000000  esp d597f168  eip f2b94b59
2018-08-12 13:46:25.782 8943-8943/? A/DEBUG: backtrace:
2018-08-12 13:46:25.782 8943-8943/? A/DEBUG:     #00 pc 00000b59  [vdso:f2b94000] (__kernel_vsyscall+9)
2018-08-12 13:46:25.783 8943-8943/? A/DEBUG:     #01 pc 0001fdf8  /system/lib/libc.so (syscall+40)
2018-08-12 13:46:25.783 8943-8943/? A/DEBUG:     #02 pc 00022ed3  /system/lib/libc.so (abort+115)
2018-08-12 13:46:25.783 8943-8943/? A/DEBUG:     #03 pc 00145cea  /data/app/com.googlecode.tesseract.android.test-L9dyqNXunn4MUzRlrLu1rg==/lib/x86/libtess.so (ERRCODE::error(char const*, TessErrorLogCode, char const*, ...) const+266)
2018-08-12 13:46:25.783 8943-8943/? A/DEBUG:     #04 pc 000e3502  /data/app/com.googlecode.tesseract.android.test-L9dyqNXunn4MUzRlrLu1rg==/lib/x86/libtess.so (_ZN9tesseract9Tesseract24init_tesseract_lang_dataEPKcS2_S2_NS_13OcrEngineModeEPPciPK13GenericVectorI6STRINGESA_bPNS_15TessdataManagerE+1170)
2018-08-12 13:46:25.783 8943-8943/? A/DEBUG:     #05 pc 000e3d9e  /data/app/com.googlecode.tesseract.android.test-L9dyqNXunn4MUzRlrLu1rg==/lib/x86/libtess.so (_ZN9tesseract9Tesseract14init_tesseractEPKcS2_S2_NS_13OcrEngineModeEPPciPK13GenericVectorI6STRINGESA_bPNS_15TessdataManagerE+606)
2018-08-12 13:46:25.783 8943-8943/? A/DEBUG:     #06 pc 0008448a  /data/app/com.googlecode.tesseract.android.test-L9dyqNXunn4MUzRlrLu1rg==/lib/x86/libtess.so (_ZN9tesseract11TessBaseAPI4InitEPKciS2_NS_13OcrEngineModeEPPciPK13GenericVectorI6STRINGESA_bPFbRKS7_PS6_IcEE+474)
2018-08-12 13:46:25.783 8943-8943/? A/DEBUG:     #07 pc 0008429b  /data/app/com.googlecode.tesseract.android.test-L9dyqNXunn4MUzRlrLu1rg==/lib/x86/libtess.so (_ZN9tesseract11TessBaseAPI4InitEPKcS2_NS_13OcrEngineModeEPPciPK13GenericVectorI6STRINGESA_b+107)
2018-08-12 13:46:25.783 8943-8943/? A/DEBUG:     #08 pc 002a80f4  /data/app/com.googlecode.tesseract.android.test-L9dyqNXunn4MUzRlrLu1rg==/lib/x86/libtess.so (Java_com_googlecode_tesseract_android_TessBaseAPI_nativeInitOem+100)
wolfhe commented 6 years ago

@rmtheis what's your testing environment? If it's running on Android/ARM instead of x86 emulator/, I suspect there are some issues in the project build setting - from the stacktrace it shows it's running some x86 code.

wolfhe commented 6 years ago

hi guys, since many of us are interested in the 4.0 stuff, why not try to build & run it and report issues here. The steps might looks like that:

  1. read tess-two wiki page and try to build it with tesseract 4.0 beta.x
  2. run your self-built tess-two with real android phones with various versions with traditional OCR engines, and report issues here
  3. run your self-built tess-two with real android phones with various versions with the fancy LSTM engine, and report issues here.

If we can just make LSTM engine (even w/o any architecture native optimization, e.g. Using hand-written Neon code (ARM SSE/AVX counterpart in x86)) run with an android phone, it would be a great leap ahead.

comments?

hejin commented 6 years ago

the ANDROID_BUILD macro in tess-two/jni/com_googlecode_tesseract_android/src/ looks problematic.

current tess-two building (branch tesseract4.0) doesnt define this macro, so it will enable the LSTM stuff for the real android build.

the lucky thing is : there are some defensive coding in tesseract/arch sources which just simply abort the x86 SSE/AVX optimization in the compiling time:

// from dotproductsse.cpp

if !defined(__SSE4_1__)

// This code can't compile with "-msse4.1", so use dummy stubs.

include "dotproductsse.h"

include

include

namespace tesseract { double DotProductSSE(const double u, const double v, int n) { fprintf(stderr, "DotProductSSE can't be used on Android\n"); abort(); } int32_t IntDotProductSSE(const int8_t u, const int8_t v, int n) { fprintf(stderr, "IntDotProductSSE can't be used on Android\n"); abort(); } } // namespace tesseract

else // !defined(__SSE4_1__)

// Non-Android code here

not sure if the result of the calling of 'abort()' is that people observed in running time while trying to launch a tess-two with LSTM engine in android.

@rmtheis

wolfhe commented 6 years ago

@hejin does this mean the LSTM feature was intended to be disabled in android?

hejin commented 6 years ago

yep it looks like the tesseract 4.0 authors won't enable LSTM feature in android platform too early for potential resource running out issues. so they use the ANDROID_BUILD macro to disable LSTM feature temporarily. however the tess-two JNI build instruction looks not to follow the rule to use the ANDROID_BUILD macro (pls correct me if wrong @rmtheis ), so the LSTM feature will be enabled in tess-two tesseract4.0 branch. as a defensive approach to avoid more issues by wrongly taken x86 AVX/SSE instructions in ARM platforms, the LSTM operators optimization people replaced the optimized operator subroutines with a calling of abort() function while the not-expected case does happen!

Robyer commented 5 years ago

Final version of Tesseract 4.0 was released few weeks ago. Is there any new progress or time expectation when it will be integrated in tess-two?

EDIT: Someone said here that he was able to compile Tesseract for Android (without tess-two) - https://groups.google.com/d/msg/tesseract-ocr/zuZYuz12oQc/VCavzreVCQAJ

rmtheis commented 5 years ago

@Robyer I won't have time to update tess-two for Tesseract 4.0 anytime soon. This project is in need of someone familiar with C++ to take this task on! I'm happy to review and test proposed changes. Please don't hesitate to contribute yourself if you're at all inclined to do so -- your past contributions have been hugely helpful.

I'm not sure what to make of the linked comment about the cmake build. Please share your results if you end up looking into that approach.

Robyer commented 5 years ago

@rmtheis Will you have time to help me understand the current build configuration that you use for native code? I tried to rework building your native code to standard ndkBuild in Gradle (I wanted to have proper native code completion and debugging in Android Studio) by removing your custom tasks, specifying jni.srcDirs = ['jni'] sourceset and adding this into tess-two build.gradle file (and similar to eyes-two):

android {
    externalNativeBuild {
        ndkBuild {
            path file('jni/Android.mk')
        }
    }
}

but there were some errors with references to liblept. It seems both tess-two and eyes-two depends on leptonica, but also tess-two depends on eyes-two. Problem is that eyes-two can't compile leptonica, but expects leptonica prebuilt library which is compiled by tess-two. So it's somehow circular reference which works only in your manual compilation.

I think we should separate leptonica into its own module and then make tess-two and eyes-two modules directly dependent on leptonica module. But I don't understand the Android.mk files and the sources enough to easily do that. Perhaps you can help with that?

So far I prepared PR #256 to make project work properly in latest Android Studio. Then if you look at https://github.com/Robyer/tess-two/commit/572c2f1a12a298728b98acb6d87846ddc7de26f0 you will see changes to use ndkBuild in Gradle, but Android.mk/Application.mk files needs to be modified to make it compile. It doesn't know how to compile liblept.so which is needed in libhydrogen.so.

rmtheis commented 5 years ago

@Robyer Agreed that using externalNativeBuild would be better than the custom task calling out to the command line. I ended up using the command line approach after giving up on getting externalNativeBuild to work. I don't recall what the sticking point was at the time.

I'm not aware of anywhere that the tess-two module depends on the eyes-two module, and the intent is to not have that type of circular dependency. I agree that it would be a better design to have Leptonica as a separate module, but the overall legacy project structure is so time-consuming for me to rearrange that I'd be reluctant to take that project on. Like you mention, it probably would require substantial changes to the Android.mk files and so on.

When I try building your ndkBuildGradle branch, I see the issue you mentioned with libhydrogen and liblept. I'm not sure how to resolve that issue. When I remove the eyes-two module from the project and try again, it starts building but then fails with the mystery error make (e=87): The parameter is incorrect. By the way, I've been using NDK r16b, which is an older version.

rhardih commented 5 years ago

I've used tess-two in the past, but since going native and basically only needing the .so files, I've switched to a more direct way of building tesseract, just using the sdk/ndk.

I'm not sure if this information is directly transferable to the build issues of tess-two, but just in case, I've got a working build chain for Tesseract 4.0.0, that might help as an example?:

https://github.com/rhardih/bad/blob/master/tesseract/tesseract-4.0.0.Dockerfile

It obviously depends on Leptonica as well, which is also included:

https://github.com/rhardih/bad/blob/master/leptonica/leptonica.Dockerfile

If these is completely unhelpful, please disregard. :)

zsmartercn commented 5 years ago

Hi,@all We porting Tesseract 4.0(final) to Android base on tess-two and rewrite dot product function with ARM NEON. The project also includes a full OCR demo App. Please view https://github.com/zsmartercn/Tess4Android.

Robyer commented 5 years ago

@zsmartercn Hi, is it intentional that you squashed all your changes into single first commit? It's completely unusable to cherry-pick potential fixes or changes back to tess-two repository. Perhaps you can make pull requests with important changes from which could benefit tess-two users?

Robyer commented 5 years ago

Success!

I created new AS project from scratch to be able to use default directory structure and configure CMake instead of ndkBuild and after various changes I'm finally able to successfully compile and use Tesseract 4.0 even with LSTM (it seems). Also debugging, code completion and other things works nicely in Android Studio 3.3.

Because of the completely reworked project structure I won't be able to provide PR for tess-two though. After I clean my code and changes, I will publish it as a separate repository.

rmtheis commented 5 years ago

Excellent--thanks @zsmartercn and @Robyer, for your contributions to open source. I'm looking forward to trying out your projects, and I'll plan to merge your changes for Tesseract 4 support back into this project when I have some time.

AmitPrajapati1902 commented 5 years ago

@Robyer When you update latest code with CMake build ? please provide some details to prepare current @zsmartercn repo to CMake base build.

Robyer commented 5 years ago

Here it is! https://github.com/adaptech-cz/Tesseract4Android 🎉

Note eyes-two is not included yet. Monitor changes from tess-two are not implemented either - it should be reworked to use PROGRESS_FUNC2 instead of editing PROGRESS_FUNC and ETEXT_DESC directly.

@rmtheis Why is in your tesseract4 branch this "Add hack to handle log2" commit? What it does?

AmitPrajapati1902 commented 5 years ago

@Robyer Thanks man, it works great.

rmtheis commented 5 years ago

@Robyer log2 was unavailable, so that commit manually replaced instances of that method call with a replacement that's mathematically equivalent in order to get the code to build, similar to what the Tesseract 4 code now has here: https://github.com/tesseract-ocr/tesseract/blob/9fd8f471f371117c2e5dff5474495218fba63e8c/src/lstm/weightmatrix.cpp#L29

Robyer commented 5 years ago

@rmtheis I see, that explains why I didn't experienced the missing log2 problem myself. Thanks.

irann93 commented 5 years ago

@zsmartercn @Robyer Thanks for the effort. It works!!

ygyin-ivy commented 5 years ago

i fixed this issue. lept 1.76.0 (should not be 1.74.*) tesseract https://codeload.github.com/tesseract-ocr/tesseract/tar.gz/4.0.0

my Android.mk of tesseract is

Android.zip , to enable lstm model.

EXPLICIT_SRC_EXCLUDES should include fileio.cpp (training use) to remove dependence of glob.c, or download a copy of glob.c to local.

when build on windows, max path length should be < 251. so i rename comgooglecode_android to

apk run correctly on mobiles of api 19->api23 (armeabi-v7a)

06-20 20:54:51.936 1354-1354/? A/DEBUG: *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
06-20 20:54:51.936 1354-1354/? A/DEBUG: Build fingerprint: 'Android/sdk_google_phone_x86/generic_x86:6.0/MASTER/3738108:userdebug/test-keys'
06-20 20:54:51.936 1354-1354/? A/DEBUG: Revision: '0'
06-20 20:54:51.936 1354-1354/? A/DEBUG: ABI: 'x86'
06-20 20:54:51.936 1354-1354/? A/DEBUG: pid: 3428, tid: 3441, name: ationTestRunner  >>> com.googlecode.tesseract.android.test <<<
06-20 20:54:51.936 1354-1354/? A/DEBUG: signal 6 (SIGABRT), code -6 (SI_TKILL), fault addr --------
06-20 20:54:51.938 1354-1354/? A/DEBUG:     eax 00000000  ebx 00000d64  ecx 00000d71  edx 00000006
06-20 20:54:51.938 1354-1354/? A/DEBUG:     esi ae7b9980  edi 00000002
06-20 20:54:51.938 1354-1354/? A/DEBUG:     xcs 00000073  xds 0000007b  xes 0000007b  xfs 0000004f  xss 0000007b
06-20 20:54:51.938 1354-1354/? A/DEBUG:     eip b72dbf26  ebp 00000d71  esp ae7b83e0  flags 00200202
06-20 20:54:51.951 1354-1354/? A/DEBUG: backtrace:
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #00 pc 00083f26  /system/lib/libc.so (tgkill+22)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #01 pc 000815f8  /system/lib/libc.so (pthread_kill+70)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #02 pc 00027205  /system/lib/libc.so (raise+36)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #03 pc 000209e4  /system/lib/libc.so (abort+80)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #04 pc 0012b127  /data/app/com.googlecode.tesseract.android.test-1/lib/x86/libtess.so (ERRCODE::error(char const*, TessErrorLogCode, char const*, ...) const+263)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #05 pc 000fdb6a  /data/app/com.googlecode.tesseract.android.test-1/lib/x86/libtess.so (tesseract::ImageData::PreScale(int, int, float*, int*, int*, GenericVector<TBOX>*) const+138)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #06 pc 0018dba6  /data/app/com.googlecode.tesseract.android.test-1/lib/x86/libtess.so (tesseract::Input::PrepareLSTMInputs(tesseract::ImageData const&, tesseract::Network const*, int, tesseract::TRand*, float*)+70)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #07 pc 00195edb  /data/app/com.googlecode.tesseract.android.test-1/lib/x86/libtess.so (tesseract::LSTMRecognizer::RecognizeLine(tesseract::ImageData const&, bool, bool, bool, float, float*, tesseract::NetworkIO*, tesseract::NetworkIO*)+155)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #08 pc 00195621  /data/app/com.googlecode.tesseract.android.test-1/lib/x86/libtess.so (tesseract::LSTMRecognizer::RecognizeLine(tesseract::ImageData const&, bool, bool, double, bool, UNICHARSET const*, TBOX const&, float, bool, tesseract::PointerVector<WERD_RES>*)+705)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #09 pc 000b852a  /data/app/com.googlecode.tesseract.android.test-1/lib/x86/libtess.so (tesseract::Tesseract::LSTMRecognizeWord(BLOCK const&, ROW*, WERD_RES*, tesseract::PointerVector<WERD_RES>*)+426)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #10 pc 000a0f45  /data/app/com.googlecode.tesseract.android.test-1/lib/x86/libtess.so (tesseract::Tesseract::classify_word_pass1(tesseract::WordData const&, WERD_RES**, tesseract::PointerVector<WERD_RES>*)+117)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #11 pc 0009df0a  /data/app/com.googlecode.tesseract.android.test-1/lib/x86/libtess.so (tesseract::Tesseract::RetryWithLanguage(tesseract::WordData const&, void (tesseract::Tesseract::*)(tesseract::WordData const&, WERD_RES**, tesseract::PointerVector<WERD_RES>*), bool, WERD_RES**, tesseract::PointerVector<WERD_RES>*)+170)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #12 pc 00098935  /data/app/com.googlecode.tesseract.android.test-1/lib/x86/libtess.so (tesseract::Tesseract::classify_word_and_language(int, PAGE_RES_IT*, tesseract::WordData*)+453)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #13 pc 00099666  /data/app/com.googlecode.tesseract.android.test-1/lib/x86/libtess.so (tesseract::Tesseract::RecogAllWordsPassN(int, ETEXT_DESC*, PAGE_RES_IT*, GenericVector<tesseract::WordData>*)+774)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #14 pc 0009ac80  /data/app/com.googlecode.tesseract.android.test-1/lib/x86/libtess.so (tesseract::Tesseract::recog_all_words(PAGE_RES*, ETEXT_DESC*, TBOX const*, char const*, int)+464)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #15 pc 0008544a  /data/app/com.googlecode.tesseract.android.test-1/lib/x86/libtess.so (tesseract::TessBaseAPI::Recognize(ETEXT_DESC*)+890)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #16 pc 00083d56  /data/app/com.googlecode.tesseract.android.test-1/lib/x86/libtess.so (tesseract::TessBaseAPI::GetUTF8Text()+70)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #17 pc 0026ba0d  /data/app/com.googlecode.tesseract.android.test-1/lib/x86/libtess.so (Java_com_googlecode_tesseract_android_TessBaseAPI_nativeGetUTF8Text+77)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #18 pc 00022d2c  /data/app/com.googlecode.tesseract.android.test-1/oat/x86/base.odex (offset 0x12000) (java.lang.String com.googlecode.tesseract.android.TessBaseAPI.nativeGetUTF8Text(long)+128)
06-20 20:54:51.953 1354-1354/? A/DEBUG:     #19 pc 00024fb5  /data/app/com.googlecode.tesseract.android.test-1/oat/x86/base.odex (offset 0x12000) (java.lang.String com.googlecode.tesseract.android.TessBaseAPI.getUTF8Text()+185)
06-20 20:54:51.953 1354-1354/? A/DEBUG:     #20 pc 000171b6  /data/app/com.googlecode.tesseract.android.test.test-1/oat/x86/base.odex (offset 0xe000) (void com.googlecode.tesseract.android.test.TessBaseAPITest.testChoiceIterator()+378)
06-20 20:54:51.953 1354-1354/? A/DEBUG:     #21 pc 00137a82  /system/lib/libart.so (art_quick_invoke_stub+338)
06-20 20:54:51.953 1354-1354/? A/DEBUG:     #22 pc 001435c4  /system/lib/libart.so (art::ArtMethod::Invoke(art::Thread*, unsigned int*, unsigned int, art::JValue*, char const*)+212)
06-20 20:54:51.953 1354-1354/? A/DEBUG:     #23 pc 0050f858  /system/lib/libart.so (art::InvokeMethod(art::ScopedObjectAccessAlreadyRunnable const&, _jobject*, _jobject*, _jobject*, unsigned int)+1736)
06-20 20:54:51.953 1354-1354/? A/DEBUG:     #24 pc 0048c5e3  /system/lib/libart.so (art::Method_invoke(_JNIEnv*, _jobject*, _jobject*, _jobject*)+80)
06-20 20:54:51.953 1354-1354/? A/DEBUG:     #25 pc 72a3aca4  /data/dalvik-cache/x86/system@framework@boot.oat (offset 0x1eb2000)
06-20 20:54:51.999 1354-1354/? A/DEBUG: Tombstone written to: /data/tombstones/tombstone_00
alexcohn commented 5 years ago

See also https://github.com/alexcohn/tess-two/tree/4.1

denzerd commented 5 years ago

Hi,

great work. A small suggestion, perhaps it would be nice to put some warning on the front page/README of this project in order to inform that there is a different repo with Tesseract-4.1.0 available. I wasted a lot of hours today because I got different results between this project and the command line, until I finally realised that the versions are different.

Best regards

rmtheis commented 5 years ago

I'm wrapping up the maintenance on this repo and I don't plan on making updates in the future. Note that updates to support Tesseract 4.0 have been made on other forks of this repo such as https://github.com/alexcohn/tess-two/tree/4.1.

Thanks everyone, for your interest and support!