sh1r0 / caffe-android-lib

Porting caffe to android platform
Other
509 stars 204 forks source link

Performance issues about using multiple threads to do prediction simultaneously (see comments) #53

Open x0chen opened 8 years ago

x0chen commented 8 years ago

Build successfully with other ANDROID_ABI, but when using armeabi, errors occur:

**[ 21%] Building C object 3rdparty/libwebp/CMakeFiles/libwebp.dir/cpu-features/cpu-features.c.o In file included from /usr/caffe-android-lib/opencv/3rdparty/libwebp/cpu-features/cpu-features.c:61:0: /usr/android-ndk-r10e/platforms/android-21/arch-arm/usr/include/machine/cpu-features.h:52:6: error: #error Unknown or unsupported ARM architecture

error Unknown or unsupported ARM architecture

  ^

[ 22%] Building C object 3rdparty/libpng/CMakeFiles/libpng.dir/pngwrite.c.o make[2]: * [3rdparty/libwebp/CMakeFiles/libwebp.dir/cpu-features/cpu-features.c.o] error 1 make[1]: * [3rdparty/libwebp/CMakeFiles/libwebp.dir/all] error 2**

What should I do? Thank you very much.

sh1r0 commented 8 years ago

Hi @x0chen, which cmake version did you use?

x0chen commented 8 years ago

cmake version 3.3.0-rc4

sh1r0 commented 8 years ago

Hi @x0chen, as there were some issues related to cmake version (e.g., #45), I would recommend you to upgrade your cmake to either 3.3.2 or 3.5.2 (both are fine in my case).

x0chen commented 8 years ago

Thank you! @sh1r0 I found another problem which is more urgent, so I will try other cmake versions later.

I want to process several images simultaneously by CNN in multiple threads. My method is achieving several (e.g. 5) Net instantiations of a CNN, then using them predict 5 images simultaneously. This method performs well on my PC, time consuming to predict 1 image with my net is about 170 ms, when predicting 5 images simultaneously in multiple threads, each prediction consumes about 220ms, and the total time consuming is less than 280ms.

But when running on android platform, PROBLEM comes: Time consuming to predict 1 image is about 500 ms, predicting 5 images sequentially consumes about 2500ms. When predicting 5 images simultaneously in multiple threads, each prediction's time consuming increases to about 7500ms, so the total time consuming is more than 8000ms, more than 3 times slower than process in one thread sequentially.

I has faced this problem before, when using multiple threads prediction on my PC. That is because I used the old version of Caffe, and the Caffe class is singleton. When I update to the newest version, the problem solved. But the Caffe version in your project is the newest.

Do you have some suggestion? Thanks!

mmx110 commented 8 years ago

hey, how can you solve your first problem? my cmake version is 3.2.2, but i met the same problem! thank you!

sh1r0 commented 8 years ago

Hi @x0chen, which blas lib did you use? Eigen or OpenBLAS?

x0chen commented 8 years ago

@sh1r0 I used both OpenBLAS and Eigen, and with the same problem.

sh1r0 commented 8 years ago

Which component did you do the multi-threading, Java or JNI?

x0chen commented 8 years ago

JNI

sh1r0 commented 8 years ago

@x0chen Would you like to show the snippet about that? Thanks.

x0chen commented 8 years ago

@sh1r0 Here is a test JNI function testFunc, and testTask is the submission function:

void testTask(std::shared_ptr<Net<float> > _net){
    long t1 = get_current_time();

    Mat im = imread("/storage/emulated/0/CharRecog/test.jpg", 0);

    int predIdx = recog_by_cnn(im, _net, Size(128, 128));   // Recognize the test image by _net

    long t2 = get_current_time();
    LOGD("Pred: %d\t Time consume: %d ms\t", predIdx, t2 - t1 );
}

JNIEXPORT void JNICALL Java_com_xchen_idcard_1detect_1recog_IDCardDetectRecog_testFunc(
    JNIEnv *env, jobject thiz) {

    LOGD("----Start intial multi thread----");
    vector<std::shared_ptr<Net<float> > > netVecs_;
    for(int i = 0; i < 5; i++){     // Initial the net five times, and push into netVecs_
        std::shared_ptr<Net<float> > netChar_(new Net<float>("/storage/emulated/0/CharRecog/models/char/deploy.prototxt", TEST));
        netChar_->CopyTrainedLayersFrom("/storage/emulated/0/CharRecog/models/char/param.caffemodel");      
        netVecs_.push_back(netChar_);
    }
    LOGD("----Start test multi thread----");
    long t1 = get_current_time();
    // A self-written ThreadPool
    std::shared_ptr<trantor::TrantorFixedThreadPool> _threadPool(new trantor::TrantorFixedThreadPool(5));

    // Push five task to the ThreadPool
    for(int i = 0; i < 5; i++)_threadPool->pushTask(std::bind(testTask, netVecs_[i]));

    // Wait until all tasks finish
    _threadPool->waitUntilFinished();
    long t2 = get_current_time();
    LOGD("All tasks time consume: %d ms", t2 - t1);

}
blueardour commented 8 years ago

@sh1r0

Hi, I met a similar problem when build the glag glog libs when the ANDROID_ABI is set to 'arm64-v8a'. If I set it as 'armeabi-v7a with NEON', everything is OK.

See log below for 'arm64-v8a'. It seems the include path for the ndk toolchian goes wrong. Many basic headers are not to found.

`-- Looking for unwind.h - not found -- Looking for C++ include ext/hash_map CMake Deprecation Warning at /usr/local/share/cmake-3.6/Modules/CMakeForceCompiler.cmake:79 (message): The CMAKE_FORCE_C_COMPILER macro is deprecated. Instead just set CMAKE_C_COMPILER and allow CMake to identify the compiler. Call Stack (most recent call first):

-- Looking for unistd.h - not found -- Looking for unwind.h CMake Deprecation Warning at /usr/local/share/cmake-3.6/Modules/CMakeForceCompiler.cmake:79 (message): The CMAKE_FORCE_C_COMPILER macro is deprecated. Instead just set CMAKE_C_COMPILER and allow CMake to identify the compiler. `

sh1r0 commented 8 years ago

@blueardour Please check the versions of cmake and ndk. Also, armeabi-v7a with NEON is actually not supported.

blueardour commented 8 years ago

cmake --version: 3.6.0-rc4. Should this be exactly 3.3.2 or 3.5.2 as you mentioned before? ndk version: r10d

sh1r0 commented 8 years ago

@blueardour Not required to be the same, but I would recommend you to have a try.

blueardour commented 8 years ago

@sh1r0 Hi I just install the cmake 3.5.2 cmake --version: cmake version 3.5.2 When I do not set ANDROID_ABI and run ./scripts/build_gflags.sh, it is OK. However when ANDROID_ABI is set to arm64-v8a, it still goes wrong: -- Looking for C++ include unistd.h -- Looking for C++ include unistd.h - not found -- Looking for C++ include stdint.h -- Looking for C++ include stdint.h - not found -- Looking for C++ include inttypes.h -- Looking for C++ include inttypes.h - not found -- Looking for C++ include sys/types.h -- Looking for C++ include sys/types.h - not found -- Looking for C++ include sys/stat.h -- Looking for C++ include sys/stat.h - not found -- Looking for C++ include fnmatch.h -- Looking for C++ include fnmatch.h - not found -- Looking for C++ include stddef.h -- Looking for C++ include stddef.h - not found -- Check size of uint32_t -- Check size of uint32_t - failed -- Check size of u_int32_t -- Check size of u_int32_t - failed CMake Error at CMakeLists.txt:162 (message): Do not know how to define a 32-bit integer quantity on your system! Neither uint32_t, u_int32_t, nor __int32 seem to be available. Set GFLAGS_INTTYPES_FORMAT to either C99, BSD, or VC7 and try again.

blueardour commented 8 years ago

Besides, when I check the caffe-android-lib/gflags-2.1.2/build/CMakeFiles/CMakeError.log file, I found /workspace/soft/android-ndk-r10d/platforms/android-21/arch-arm64/usr/lib/libc.so: undefined reference todlclose@LIBC' 13 /workspace/soft/android-ndk-r10d/platforms/android-21/arch-arm64/usr/lib/libc.so: undefined reference to dlopen@LIBC' 14 /workspace/soft/android-ndk-r10d/platforms/android-21/arch-arm64/usr/lib/libc.so: undefined reference todlerror@LIBC' 15 /workspace/soft/android-ndk-r10d/platforms/android-21/arch-arm64/usr/lib/libc.so: undefined reference to dladdr@LIBC' 16 /workspace/soft/android-ndk-r10d/platforms/android-21/arch-arm64/usr/lib/libc.so: undefined reference todlsym@LIBC' 17 /workspace/soft/android-ndk-r10d/platforms/android-21/arch-arm64/usr/lib/libc.so: undefined reference to dl_iterate_phdr@LIBC'

I met this problem before. It could be solved by add the link option: -Wl,-unresolved-symbols=ignore-in-shared-libs

sh1r0 commented 8 years ago

@blueardour Tested on ubuntu 14.04 with cmake 3.5.2, android-ndk-r11c via the command below and got no errors. ANDROID_ABI=arm64-v8a ./scripts/build_gflags.sh

blueardour commented 8 years ago

@sh1r0 Hi, I verified all the path manually by dumping compile and link flags (also the ANDROIDABI variable) in the android-cmake/android.toolchain.cmake. And now the error was gone. (^^)

raginisharma14 commented 7 years ago

Hi could someone let me know what files do we need to push onto the android device after successful BUILD of caffe? Do I need to push entire repo?

sfssqs commented 7 years ago

@sh1r0 Hi, Could you please tell me which develop IDE do you use, can you debug the code step by step and set breakpoint?