mlc-ai / tokenizers-cpp

Universal cross-platform tokenizers binding to HF and sentencepiece
Apache License 2.0
211 stars 47 forks source link

SentencePiece Build Error - ld: error: undefined symbol: __android_log_write #7

Closed zjc664656505 closed 11 months ago

zjc664656505 commented 11 months ago

Dear mlc-ai developers,

Recently, I'm deploying the tokenizer to my android environment. I have successfully built the Huggingface Tokenizer. However, when I try to build the sentencepiece tokenizer I met this error:

: && /Users/junchenzhao/Library/Android/sdk/ndk/25.1.8937393/toolchains/llvm/prebuilt/darwin-x86_64/bin/clang++ --target=aarch64-none-linux-android24 --sysroot=/Users/junchenzhao/Library/Android/sdk/ndk/25.1.8937393/toolchains/llvm/prebuilt/darwin-x86_64/sysroot -O3 -Wall -fPIC -g -DANDROID -fdata-sections -ffunction-sections -funwind-tables -fstack-protector-strong -no-canonical-prefixes -D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security  -std=c++17 -fmacro-prefix-map=/Users/junchenzhao/Dist-CPU-Learn/android/distributed_inference_demo/test1/src/main/cpp/='' -fno-limit-debug-info -static-libstdc++ -Wl,--build-id=sha1 -Wl,--no-rosegment -Wl,--fatal-warnings -Wl,--gc-sections -Wl,--no-undefined -Qunused-arguments -Wl,--gc-sections tokenizers_cpp/sentencepiece/src/CMakeFiles/spm_decode.dir/spm_decode_main.cc.o -o /Users/junchenzhao/Dist-CPU-Learn/android/distributed_inference_demo/test1/build/intermediates/cxx/Debug/6m5u3o15/obj/arm64-v8a/spm_decode  tokenizers_cpp/sentencepiece/src/libsentencepiece.a  -pthread  -latomic -lm && :

ld: error: undefined symbol: __android_log_write
>>> referenced by common.cc:150 (/test1/src/main/cpp/tokenizers-cpp/sentencepiece/third_party/protobuf-lite/common.cc:150)
>>>               common.cc.o:(google::protobuf::internal::DefaultLogHandler(google::protobuf::LogLevel, char const*, int, std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&)) in archive tokenizers_cpp/sentencepiece/src/libsentencepiece.a
>>> referenced by common.cc:158 (/test1/src/main/cpp/tokenizers-cpp/sentencepiece/third_party/protobuf-lite/common.cc:158)
>>>               common.cc.o:(google::protobuf::internal::DefaultLogHandler(google::protobuf::LogLevel, char const*, int, std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&)) in archive tokenizers_cpp/sentencepiece/src/libsentencepiece.a

Here is my CMakeList.txt under the src/main/cpp folder:

# Sets the minimum version of CMake required to build the native library.
cmake_minimum_required(VERSION 3.18.1)

project(distributed_inference_demo C CXX)

add_library(
        # Sets the name of the library.
        distributed_inference_demo
        # Sets the library as a shared library.
        SHARED
        # Provides a relative path to your source file(s).
        native-lib.cpp
        utils.cpp
        inference.cpp
)

set(TOKENIZER_CPP_PATH ${CMAKE_SOURCE_DIR}/tokenizers-cpp)
add_subdirectory(${TOKENIZER_CPP_PATH} tokenizers_cpp)

target_include_directories(distributed_inference_demo PRIVATE
        ${CMAKE_SOURCE_DIR}/include/
        ${TOKENIZER_CPP_PATH}/include/)

add_library(onnxruntime SHARED IMPORTED)
set_target_properties(onnxruntime PROPERTIES IMPORTED_LOCATION ${CMAKE_SOURCE_DIR}/lib/libonnxruntime.so)

# Searches for a specified prebuilt library and stores the path as a
# variable. Because CMake includes system libraries in the search path by
# default, you only need to specify the name of the public NDK library
# you want to add. CMake verifies that the library exists before
# completing its build.
find_library(
        # Sets the name of the path variable.
        log-lib
        # Specifies the name of the NDK library that
        # you want CMake to locate.
        log
)

# Specifies libraries CMake should link to your target library. You
# can link multiple libraries, such as libraries you define in this
# build script, prebuilt third-party libraries, or system libraries.
target_link_libraries(
        distributed_inference_demo
        sentencepiece-static
        tokenizers_c
        tokenizers_cpp
        ${log-lib}
        onnxruntime

)

I'm not sure why this error keeps coming up. I directly cloned this repo and the corresponding repo from sentencepiece and

Please let me know how to solve this issue.

Thanks a lot!

junrushao commented 11 months ago

Would you mind trying out this: https://stackoverflow.com/questions/37617919/error-undefined-reference-to-android-log-write-error

zjc664656505 commented 11 months ago

Hi Junru,

Thanks for your kindly response.

I have checked this answer and don't think this approach works. Since my Android C++ backend already has the CMakeList.txt and I have already included the log lib in it, by configuring the Android.mk file will not solve my issue.

zjc664656505 commented 11 months ago

I have built the sentencepiece again, but still meet the same error under tokenizers-cpp in Android Studio, but still get the same error message.

tqchen commented 11 months ago

Likely due to a missing dep of the log and working on getting the cmake log in your end, not sure how to debug further, maybe try different ways to link logs

zjc664656505 commented 11 months ago

Thanks for your reply. I find a way around. In sentencepiece, if we manually link logs with it in the sentencepiece/src/CMakeList.txt, then it should work.

Original: image

Modified: image

I will close this issue since it's resolved.

junrushao commented 11 months ago

Interesting! Thanks for sharing your workaround. Definitely something useful if anyone encounters this issue in the future