openvinotoolkit / openvino_tokenizers

OpenVINO Tokenizers extension
Apache License 2.0
17 stars 13 forks source link

Cannot build OpenVINO Tokenizers with `BUILD_FAST_TOKENIZERS` in Clang #171

Closed zhxie closed 3 weeks ago

zhxie commented 1 month ago

Context

When building OpenVINO Tokenziers with -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_CXX_FLAGS="-stdlib=libc++" -DCMAKE_EXE_LINKER_FLAGS="-stdlib=libc++ -lc++abi" -DCMAKE_SHARED_LINKER_FLAGS="-stdlib=libc++ -lc++abi" and -DBUILD_FAST_TOKENIZERS=ON, OpenVINO Tokenizers fails to build.

For a deeper look, the reason is that fast_tokenizer, a dependency of OpenVINO Tokenizers, cannot be built with Clang. The issue (https://github.com/PaddlePaddle/PaddleNLP/issues/8565) has been filed in PaddleNLP.

To make a workaround, I changed src/icu4c.patch to

diff --git a/fast_tokenizer/cmake/external/icu.cmake b/fast_tokenizer/cmake/external/icu.cmake
index cd604d38..6be44bdb 100644
--- a/fast_tokenizer/cmake/external/icu.cmake
+++ b/fast_tokenizer/cmake/external/icu.cmake
@@ -113,7 +113,7 @@ ExternalProject_Add(
         GIT_PROGRESS      1
         PREFIX            ${ICU_PREFIX_DIR}
         UPDATE_COMMAND    ""
-        CONFIGURE_COMMAND ${HOST_ENV_CMAKE} ../extern_icu/icu4c/source/runConfigureICU "Linux/gcc" --enable-static --disable-shared --enable-rpath
+        CONFIGURE_COMMAND ${HOST_ENV_CMAKE} ../extern_icu/icu4c/source/runConfigureICU "Linux" --enable-static --enable-rpath
         BUILD_COMMAND make -j4
         INSTALL_COMMAND make install prefix="" DESTDIR=${ICU_INSTALL_DIR} install
         BUILD_BYPRODUCTS ${ICU_LIBRARIES}

to remove the hardcoded GCC compiler. The patch works well in fast_tokenzier, but not in OpenVINO Tokenizers. The build process looks the same without the patch, and the dependency CMake downloaded remains the same after patching. It can be checked via cat fast_tokenizer/cmake/external/icu.cmake | grep shared in the CMake build directory.

What needs to be done?

Example Pull Requests

No response

Resources

Contact points

@ilya-lavrenov

Ticket

No response

mryzhov commented 4 weeks ago

Looks like that patch has no been applied, looking for the root cause

mryzhov commented 3 weeks ago

@zhxie the issue should be fixed, could you please take a look