mlc-ai / tokenizers-cpp

Universal cross-platform tokenizers binding to HF and sentencepiece
Apache License 2.0
211 stars 47 forks source link

add rwkv world tokenizer #14

Closed BBuf closed 10 months ago

BBuf commented 10 months ago
图片

prepare for https://github.com/mlc-ai/mlc-llm/pull/848

BBuf commented 10 months ago

After latest commit, unittest can also run success.

图片

Hzfengsy commented 10 months ago

Thanks @BBuf

junrushao commented 10 months ago

@BBuf @Hzfengsy This commit introduces a regression on windows build and blocks our nightly package rebuild. Please see the detailed logs below:

D:\a\package\package>cd mlc-llm 

D:\a\package\package\mlc-llm>rd /s /q build 
The system cannot find the file specified.

D:\a\package\package\mlc-llm>mkdir build 

D:\a\package\package\mlc-llm>cd build 

D:\a\package\package\mlc-llm\build>cmake -A x64 -Thost=x64       -G "Visual Studio 17 2022"       -DUSE_VULKAN=ON       .. 
-- The C compiler identification is MSVC 1[9](https://github.com/mlc-ai/package/actions/runs/6069714174/job/16464517741#step:10:10).36.32538.0
-- The CXX compiler identification is MSVC 19.36.32538.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: C:/Program Files/Microsoft Visual Studio/2022/Enterprise/VC/Tools/MSVC/14.36.32532/bin/HostX64/x64/cl.exe - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: C:/Program Files/Microsoft Visual Studio/2022/Enterprise/VC/Tools/MSVC/14.36.32532/bin/HostX64/x64/cl.exe - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Performing Test SUPPORT_CXX17
-- Performing Test SUPPORT_CXX17 - Success
-- Setting default build type to RelWithDebInfo
-- Hide private symbols
-- TVM_HOME: 3rdparty/tvm
-- Found the path to ccache, enabling ccache
-- VTA build is skipped in Windows..
-- Vulkan_INCLUDE_DIRS=C:/Miniconda/envs/tlcpack-build/Library/includeC:/Miniconda/envs/tlcpack-build/Library/include/spirv-toolsC:/Miniconda/envs/tlcpack-build/Library/include/spirv/unified1C:/Miniconda/envs/tlcpack-build/Library/include/spirv/unified1
-- Vulkan_LIBRARY=C:/Miniconda/envs/tlcpack-build/Library/lib/vulkan-1.lib
-- Vulkan_SPIRV_TOOLS_LIBRARY=C:/Miniconda/envs/tlcpack-build/Library/lib/SPIRV-Tools.lib
-- Build with Vulkan support
-- Build with contrib.random
-- Build with contrib.sort
-- Build with contrib.hybriddump
-- Git found: C:/Miniconda/envs/tlcpack-build/Library/bin/git.exe
-- Found TVM_GIT_COMMIT_HASH=b0d1c21f329e6aed8fd639c530c29acc3b1f9305
-- Found TVM_GIT_COMMIT_TIME=2023-08-31 15:57:11 -0400
-- Building with TVM Map...
-- Build with thread support...
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - not found
-- Found Threads: TRUE  
-- Performing Test FILE_PREFIX_MAP_SUPPORTED
-- Performing Test FILE_PREFIX_MAP_SUPPORTED - Failed
-- system-nameWindows
MSBuild version 17.7.2+d6990bcfa for .NET Framework

  1>Checking Build System
  1>Creating directories for 'msgpack-populate'
  Performing download step (git clone) for 'msgpack-populate'
  Cloning into 'msgpack-src'...
  HEAD is now at 8c602e85 Merge pull request #[10](https://github.com/mlc-ai/package/actions/runs/6069714174/job/16464517741#step:10:11)83 from redboltz/update_610
  CMake Error at msgpack-subbuild/msgpack-populate-prefix/tmp/msgpack-populate-gitclone.cmake:62 (message):
    Failed to update submodules in:
-- Configuring incomplete, errors occurred!
    'D:/a/package/package/mlc-llm/build/_deps/msgpack-src'

C:\Program Files\Microsoft Visual Studio\2022\Enterprise\MSBuild\Microsoft\VC\v170\Microsoft.CppCommon.targets(249,5): error MSB8066: Custom build for 'D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\CMakeFiles\dfa5057374fb3dfcddc46b7b964a6e83\msgpack-populate-download.rule;D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\CMakeFiles\dfa5057374fb3dfcddc46b7b964a6e83\msgpack-populate-update.rule;D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\CMakeFiles\dfa5057374fb3dfcddc46b7b964a6e83\msgpack-populate-patch.rule;D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\CMakeFiles\dfa5057374fb3dfcddc46b7b964a6e83\msgpack-populate-configure.rule;D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\CMakeFiles\dfa5057374fb3dfcddc46b7b964a6e83\msgpack-populate-build.rule;D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\CMakeFiles\dfa5057374fb3dfcddc46b7b964a6e83\msgpack-populate-install.rule;D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\CMakeFiles\dfa5057374fb3dfcddc46b7b964a6e83\msgpack-populate-test.rule;D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\CMakeFiles\df644a9fff8f1cab49663ee66d0ee69a\msgpack-populate-complete.rule;D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\CMakeFiles\b[12](https://github.com/mlc-ai/package/actions/runs/6069714174/job/16464517741#step:10:13)dc4ee056b4292e5c2bb36792642c2\msgpack-populate.rule;D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\CMakeLists.txt' exited with code 1. [D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\msgpack-populate.vcxproj]

CMake Error at C:/Miniconda/envs/tlcpack-build/Library/share/cmake-3.27/Modules/FetchContent.cmake:[16](https://github.com/mlc-ai/package/actions/runs/6069714174/job/16464517741#step:10:17)62 (message):
  Build step for msgpack failed: 1
Call Stack (most recent call first):
  C:/Miniconda/envs/tlcpack-build/Library/share/cmake-3.27/Modules/FetchContent.cmake:1802:EVAL:2 (__FetchContent_directPopulate)
  C:/Miniconda/envs/tlcpack-build/Library/share/cmake-3.27/Modules/FetchContent.cmake:1802 (cmake_language)
  C:/Miniconda/envs/tlcpack-build/Library/share/cmake-3.27/Modules/FetchContent.cmake:2016 (FetchContent_Populate)
  3rdparty/tokenizers-cpp/CMakeLists.txt:86 (FetchContent_MakeAvailable)

D:\a\package\package\mlc-llm\build>cmake --build . --parallel 3 --config Release -- /m 
MSBuild version [17](https://github.com/mlc-ai/package/actions/runs/6069714174/job/16464517741#step:10:18).7.2+d[69](https://github.com/mlc-ai/package/actions/runs/6069714174/job/16464517741#step:10:70)90bcfa for .NET Framework
MSBUILD : error MSB1009: Project file does not exist.
Switch: ALL_BUILD.vcxproj

D:\a\package\package\mlc-llm\build>cd ..\..

Link to the nightly run: https://github.com/mlc-ai/package/actions/runs/6069714174/job/16464517741

I do not have any clear idea how to get it fixed as I don't have a windows machine, but usually we would prefer submodule to FetchContent for network stability concerns.

tqchen commented 10 months ago

@BBuf would be great if we can followup on this. In the meantime, we can pin tokenizer-cpp to an earlier version if needed for now. Indeed submodule would be better

BBuf commented 10 months ago

I will change it to the submodule approach as soon as possible.

BBuf commented 10 months ago

@BBuf @Hzfengsy This commit introduces a regression on windows build and blocks our nightly package rebuild. Please see the detailed logs below:

D:\a\package\package>cd mlc-llm 

D:\a\package\package\mlc-llm>rd /s /q build 
The system cannot find the file specified.

D:\a\package\package\mlc-llm>mkdir build 

D:\a\package\package\mlc-llm>cd build 

D:\a\package\package\mlc-llm\build>cmake -A x64 -Thost=x64       -G "Visual Studio 17 2022"       -DUSE_VULKAN=ON       .. 
-- The C compiler identification is MSVC 1[9](https://github.com/mlc-ai/package/actions/runs/6069714174/job/16464517741#step:10:10).36.32538.0
-- The CXX compiler identification is MSVC 19.36.32538.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: C:/Program Files/Microsoft Visual Studio/2022/Enterprise/VC/Tools/MSVC/14.36.32532/bin/HostX64/x64/cl.exe - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: C:/Program Files/Microsoft Visual Studio/2022/Enterprise/VC/Tools/MSVC/14.36.32532/bin/HostX64/x64/cl.exe - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Performing Test SUPPORT_CXX17
-- Performing Test SUPPORT_CXX17 - Success
-- Setting default build type to RelWithDebInfo
-- Hide private symbols
-- TVM_HOME: 3rdparty/tvm
-- Found the path to ccache, enabling ccache
-- VTA build is skipped in Windows..
-- Vulkan_INCLUDE_DIRS=C:/Miniconda/envs/tlcpack-build/Library/includeC:/Miniconda/envs/tlcpack-build/Library/include/spirv-toolsC:/Miniconda/envs/tlcpack-build/Library/include/spirv/unified1C:/Miniconda/envs/tlcpack-build/Library/include/spirv/unified1
-- Vulkan_LIBRARY=C:/Miniconda/envs/tlcpack-build/Library/lib/vulkan-1.lib
-- Vulkan_SPIRV_TOOLS_LIBRARY=C:/Miniconda/envs/tlcpack-build/Library/lib/SPIRV-Tools.lib
-- Build with Vulkan support
-- Build with contrib.random
-- Build with contrib.sort
-- Build with contrib.hybriddump
-- Git found: C:/Miniconda/envs/tlcpack-build/Library/bin/git.exe
-- Found TVM_GIT_COMMIT_HASH=b0d1c21f329e6aed8fd639c530c29acc3b1f9305
-- Found TVM_GIT_COMMIT_TIME=2023-08-31 15:57:11 -0400
-- Building with TVM Map...
-- Build with thread support...
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - not found
-- Found Threads: TRUE  
-- Performing Test FILE_PREFIX_MAP_SUPPORTED
-- Performing Test FILE_PREFIX_MAP_SUPPORTED - Failed
-- system-nameWindows
MSBuild version 17.7.2+d6990bcfa for .NET Framework

  1>Checking Build System
  1>Creating directories for 'msgpack-populate'
  Performing download step (git clone) for 'msgpack-populate'
  Cloning into 'msgpack-src'...
  HEAD is now at 8c602e85 Merge pull request #[10](https://github.com/mlc-ai/package/actions/runs/6069714174/job/16464517741#step:10:11)83 from redboltz/update_610
  CMake Error at msgpack-subbuild/msgpack-populate-prefix/tmp/msgpack-populate-gitclone.cmake:62 (message):
    Failed to update submodules in:
-- Configuring incomplete, errors occurred!
    'D:/a/package/package/mlc-llm/build/_deps/msgpack-src'

C:\Program Files\Microsoft Visual Studio\2022\Enterprise\MSBuild\Microsoft\VC\v170\Microsoft.CppCommon.targets(249,5): error MSB8066: Custom build for 'D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\CMakeFiles\dfa5057374fb3dfcddc46b7b964a6e83\msgpack-populate-download.rule;D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\CMakeFiles\dfa5057374fb3dfcddc46b7b964a6e83\msgpack-populate-update.rule;D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\CMakeFiles\dfa5057374fb3dfcddc46b7b964a6e83\msgpack-populate-patch.rule;D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\CMakeFiles\dfa5057374fb3dfcddc46b7b964a6e83\msgpack-populate-configure.rule;D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\CMakeFiles\dfa5057374fb3dfcddc46b7b964a6e83\msgpack-populate-build.rule;D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\CMakeFiles\dfa5057374fb3dfcddc46b7b964a6e83\msgpack-populate-install.rule;D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\CMakeFiles\dfa5057374fb3dfcddc46b7b964a6e83\msgpack-populate-test.rule;D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\CMakeFiles\df644a9fff8f1cab49663ee66d0ee69a\msgpack-populate-complete.rule;D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\CMakeFiles\b[12](https://github.com/mlc-ai/package/actions/runs/6069714174/job/16464517741#step:10:13)dc4ee056b4292e5c2bb36792642c2\msgpack-populate.rule;D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\CMakeLists.txt' exited with code 1. [D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\msgpack-populate.vcxproj]

CMake Error at C:/Miniconda/envs/tlcpack-build/Library/share/cmake-3.27/Modules/FetchContent.cmake:[16](https://github.com/mlc-ai/package/actions/runs/6069714174/job/16464517741#step:10:17)62 (message):
  Build step for msgpack failed: 1
Call Stack (most recent call first):
  C:/Miniconda/envs/tlcpack-build/Library/share/cmake-3.27/Modules/FetchContent.cmake:1802:EVAL:2 (__FetchContent_directPopulate)
  C:/Miniconda/envs/tlcpack-build/Library/share/cmake-3.27/Modules/FetchContent.cmake:1802 (cmake_language)
  C:/Miniconda/envs/tlcpack-build/Library/share/cmake-3.27/Modules/FetchContent.cmake:2016 (FetchContent_Populate)
  3rdparty/tokenizers-cpp/CMakeLists.txt:86 (FetchContent_MakeAvailable)

D:\a\package\package\mlc-llm\build>cmake --build . --parallel 3 --config Release -- /m 
MSBuild version [17](https://github.com/mlc-ai/package/actions/runs/6069714174/job/16464517741#step:10:18).7.2+d[69](https://github.com/mlc-ai/package/actions/runs/6069714174/job/16464517741#step:10:70)90bcfa for .NET Framework
MSBUILD : error MSB1009: Project file does not exist.
Switch: ALL_BUILD.vcxproj

D:\a\package\package\mlc-llm\build>cd ..\..

Link to the nightly run: https://github.com/mlc-ai/package/actions/runs/6069714174/job/16464517741

I do not have any clear idea how to get it fixed as I don't have a windows machine, but usually we would prefer submodule to FetchContent for network stability concerns.

https://github.com/mlc-ai/tokenizers-cpp/pull/15