mlc-ai / mlc-llm

Universal LLM Deployment Engine with ML Compilation
https://llm.mlc.ai/
Apache License 2.0
18.99k stars 1.56k forks source link

how to build mlc-llm-cli on Linux #119

Closed zhaoyang-star closed 1 year ago

zhaoyang-star commented 1 year ago

I want to run vicuna-7b on nv gpu based on mlc-llm. I followed the intruction and have some changes:

  1. Install relax.

    git clone https://github.com/mlc-ai/relax.git --recursive
    cd relax
    mkdir build
    cp cmake/config.cmake build

    in build/config.cmake, set USE_CUDA, USE_CUDNN, USE_CUBLAS and USE_LLVM as ON

    cmake ..
    make -j
    export TVM_HOME=/path/to/relax
    export PYTHONPATH=$PYTHONPATH:$TVM_HOME/python
  2. Get Model Weight I just use the following:

    git lfs install
    git clone https://huggingface.co/mlc-ai/demo-vicuna-v1-7b-int3 dist/vicuna-v1-7b
    mkdir -p dist/models
    ln -s path/to/vicuna-v1-7b dist/models/vicuna-v1-7b

    But there is no config.json and some other necessary files in vicuna-v1-7b path! They are transformed by transform_params. We need vicuna-v1-7b Pytorch format.

    $ tree dist/
    dist/
    └── models
    └── vicuna-v1-7b
        ├── float16
        │   ├── ndarray-cache.json
        │   ├── tokenizer.model
        │   ├── params_shard_0.bin
        │   ├── params_shard_100.bin
        │   ├── params_shard_101.bin
        │   ├── params_shard_102.bin
        │   ├── params_shard_103.bin
        │   ├── params_shard_104.bin
        │   ├── params_shard_105.bin
    ...

I found that there is a vicuna-v1-7b with huggingface format whose dtype is float16 and vocab_size is 32001. So I downloaded this one.

  1. build model to library

    git clone https://github.com/mlc-ai/mlc-llm.git --recursive
    cd mlc-llm
    # change vocab_size=32001 in llma.py
    python3 build.py --model vicuna-v1-7b --dtype float16 --target cuda --max-seq-len 768 --artifact-path ../dist/   

    The output:

    ...
    Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:10<00:00,  1.39it/s]
    Total param size: 3.9229860305786133 GB
    Start storing to cache ../dist/vicuna-v1-7b/float16/params
    [0745/0745] saving param_744
    All finished, 132 total shards committed, record saved to ../dist/vicuna-v1-7b/float16/params/ndarray-cache.json
    Save a cached module to ../dist/vicuna-v1-7b/float16/mod_cache_before_build_float16.pkl.
    20 static functions: [I.GlobalVar("rotary_embedding1"), I.GlobalVar("fused_decode1_fused_matmul5_multiply"), I.GlobalVar("decode4"), I.GlobalVar("slice1"), I.GlobalVar("squeeze"), I.GlobalVar("fused_decode_matmul3"), I.GlobalVar("fused_decode_fused_matmul3_add"), I.GlobalVar("fused_decode2_fused_matmul6_add"), I.GlobalVar("decode6"), I.GlobalVar("fused_transpose4_reshape4"), I.GlobalVar("transpose2"), I.GlobalVar("rms_norm1"), I.GlobalVar("fused_decode1_fused_matmul5_silu"), I.GlobalVar("reshape1"), I.GlobalVar("decode5"), I.GlobalVar("reshape"), I.GlobalVar("fused_reshape2_squeeze"), I.GlobalVar("reshape2"), I.GlobalVar("take_decode"), I.GlobalVar("fused_decode3_fused_matmul7_cast2")]
    26 dynamic functions: [I.GlobalVar("fused_NT_matmul1_add1"), I.GlobalVar("extend_te"), I.GlobalVar("full"), I.GlobalVar("reshape3"), I.GlobalVar("reshape5"), I.GlobalVar("rotary_embedding"), I.GlobalVar("take_decode1"), I.GlobalVar("fused_NT_matmul_divide_maximum_minimum_cast"), I.GlobalVar("NT_matmul1"), I.GlobalVar("fused_softmax1_cast4"), I.GlobalVar("fused_NT_matmul3_silu1"), I.GlobalVar("fused_NT_matmul2_divide1_maximum1_minimum1_cast3"), I.GlobalVar("matmul8"), I.GlobalVar("transpose5"), I.GlobalVar("fused_softmax_cast1"), I.GlobalVar("fused_min_max_triu_te_broadcast_to"), I.GlobalVar("reshape7"), I.GlobalVar("reshape8"), I.GlobalVar("slice"), I.GlobalVar("fused_NT_matmul3_multiply1"), I.GlobalVar("squeeze1"), I.GlobalVar("matmul4"), I.GlobalVar("transpose3"), I.GlobalVar("rms_norm"), I.GlobalVar("fused_NT_matmul4_add1"), I.GlobalVar("reshape6")]
    Dump static shape TIR to ../dist/vicuna-v1-7b/float16/mod_tir_static.py
    Dump dynamic shape TIR to ../dist/vicuna-v1-7b/float16/mod_tir_dynamic.py
    - Dispatch to pre-scheduled op: fused_decode1_fused_matmul5_multiply
    - Dispatch to pre-scheduled op: decode4
    - Dispatch to pre-scheduled op: fused_NT_matmul1_add1
    - Dispatch to pre-scheduled op: decode5
    - Dispatch to pre-scheduled op: NT_matmul1
    - Dispatch to pre-scheduled op: fused_NT_matmul_divide_maximum_minimum_cast
    - Dispatch to pre-scheduled op: fused_softmax1_cast4
    - Dispatch to pre-scheduled op: fused_NT_matmul3_silu1
    - Dispatch to pre-scheduled op: fused_NT_matmul2_divide1_maximum1_minimum1_cast3
    - Dispatch to pre-scheduled op: fused_decode_matmul3
    - Dispatch to pre-scheduled op: matmul8
    - Dispatch to pre-scheduled op: fused_softmax_cast1
    - Dispatch to pre-scheduled op: fused_min_max_triu_te_broadcast_to
    - Dispatch to pre-scheduled op: fused_decode_fused_matmul3_add
    - Dispatch to pre-scheduled op: fused_decode2_fused_matmul6_add
    - Dispatch to pre-scheduled op: decode6
    - Dispatch to pre-scheduled op: fused_decode1_fused_matmul5_silu
    - Dispatch to pre-scheduled op: fused_NT_matmul3_multiply1
    - Dispatch to pre-scheduled op: matmul4
    - Dispatch to pre-scheduled op: rms_norm
    - Dispatch to pre-scheduled op: fused_NT_matmul4_add1
    Finish exporting to ../dist/vicuna-v1-7b/float16/vicuna-v1-7b_cuda_float16.so
  2. Prepare lib and params There is instruction for ios as following. After Step 3, params and vicuna-v1-7b_cuda_float16.so are under path ../dist/vicuna-v1-7b/float16/params

    cd ios
    ./prepare_libs.sh
    ./prepare_params.sh
  3. Build mlc-llm-cli Is there any introduction on how to build mlc-llm-cli? I use the following but got error. @MasterJH5574 Could you please give some advice? Thanks a lot^_^

    mkdir build && cd build
    cmake .. && make
    [  0%] Building CXX object CMakeFiles/mlc_llm_objs.dir/cpp/llm_chat.cc.o
    [  0%] Built target mlc_llm_objs
    [  0%] Building CXX object CMakeFiles/mlc_cli_objs.dir/cpp/cli_main.cc.o
    [  0%] Built target mlc_cli_objs
    [  0%] Generating release/libtokenizers_cpp.a
    No such file or directory
    make[2]: *** [tokenizers/CMakeFiles/tokenizers.dir/build.make:71: tokenizers/release/libtokenizers_cpp.a] Error 1
    make[1]: *** [CMakeFiles/Makefile2:674: tokenizers/CMakeFiles/tokenizers.dir/all] Error 2
    make: *** [Makefile:156: all] Error 2

Run make -n to get debug info:

make -n
...
/z/env_init/env_tvm/lib/python3.8/site-packages/cmake/data/bin/cmake -E cmake_echo_color --switch= --green --progress-dir=/z/Dev/mlc-llm/build/CMakeFiles --progress-num=100 "Building CXX object tvm/CMakeFiles/tvm_runtime_objs.dir/src/runtime/contrib/sort/sort.cc.o"
cd /z/Dev/mlc-llm/build/tvm && /usr/bin/c++ -DDMLC_USE_FOPEN64=0 -DDMLC_USE_LOGGING_LIBRARY="<tvm/runtime/logging.h>" -DNDEBUG -DNDEBUG=1 -DTVM_INDEX_DEFAULT_I64=1 -DTVM_THREADPOOL_USE_OPENMP=0 -DTVM_USE_LIBBACKTRACE=0 -DUSE_FALLBACK_STL_MAP=0 -I/z/Dev/relax/include -I/z/Dev/relax/3rdparty/libcrc/include -isystem /z/Dev/relax/3rdparty/dlpack/include -isystem /z/Dev/relax/3rdparty/dmlc-core/include -isystem /z/Dev/relax/3rdparty/rang/include -isystem /z/Dev/relax/3rdparty/compiler-rt -isystem /z/Dev/relax/3rdparty/picojson -isystem /usr/local/cuda/include -std=c++17 -faligned-new -O2 -Wall -fPIC -std=c++17  -O2 -g -DNDEBUG -fPIC -ffile-prefix-map=..=/z/Dev/relax -MD -MT tvm/CMakeFiles/tvm_runtime_objs.dir/src/runtime/contrib/sort/sort.cc.o -MF CMakeFiles/tvm_runtime_objs.dir/src/runtime/contrib/sort/sort.cc.o.d -o CMakeFiles/tvm_runtime_objs.dir/src/runtime/contrib/sort/sort.cc.o -c /z/Dev/relax/src/runtime/contrib/sort/sort.cc
/z/env_init/env_tvm/lib/python3.8/site-packages/cmake/data/bin/cmake -E cmake_echo_color --switch= --progress-dir=/z/Dev/mlc-llm/build/CMakeFiles --progress-num=95,96,97,98,99,100 "Built target tvm_runtime_objs"
make -s -f tvm/CMakeFiles/tvm_runtime.dir/build.make tvm/CMakeFiles/tvm_runtime.dir/depend
cd /z/Dev/mlc-llm/build && /z/env_init/env_tvm/lib/python3.8/site-packages/cmake/data/bin/cmake -E cmake_depends "Unix Makefiles" /z/Dev/mlc-llm /z/Dev/relax /z/Dev/mlc-llm/build /z/Dev/mlc-llm/build/tvm /z/Dev/mlc-llm/build/tvm/CMakeFiles/tvm_runtime.dir/DependInfo.cmake --color=
make -s -f tvm/CMakeFiles/tvm_runtime.dir/build.make tvm/CMakeFiles/tvm_runtime.dir/build
make[2]: *** No rule to make target 'tvm/CMakeFiles/tvm_runtime_objs.dir/src/runtime/builtin_fp16.cc.o', needed by 'tvm/libtvm_runtime.so'.  Stop.
make[1]: *** [CMakeFiles/Makefile2:438: tvm/CMakeFiles/tvm_runtime.dir/all] Error 2
make: *** [Makefile:156: all] Error 2

mlc-llm commit id: 909f267 relax commit id: 227cacd

AlphaAtlas commented 1 year ago

There are a couple of things here

yzh119 commented 1 year ago

Hi @zhaoyang-star , would you mind checking whether you have cloned the submodules (if 3rdparty/tokenizers is empty, then submodules are not cloned properly), if not, please update submodules via:

git submodules update --init --recursive
zhaoyang-star commented 1 year ago

git submodules update --init --recursive

I am sure I have got all submodules before building. There is no file changed after running git submodule update --init --recursive under mlc-llm project. @yzh119

(env_tvm) root@12800db2b9db:/z/Dev/mlc-llm/3rdparty# ls
argparse  sentencepiece-js  tokenizers-cpp

BTW, tokenizers-cpp is not a git submodule. image

I use cmake .. -DCMAKE_VERBOSE_MAKEFILE=ON && make to get debug info. The output:

(env_tvm) root@12800db2b9db:/z/Dev/mlc-llm/build# make 
/z/env_init/env_tvm/lib/python3.8/site-packages/cmake/data/bin/cmake -P /z/Dev/mlc-llm/build/CMakeFiles/VerifyGlobs.cmake
/z/env_init/env_tvm/lib/python3.8/site-packages/cmake/data/bin/cmake -S/z/Dev/mlc-llm -B/z/Dev/mlc-llm/build --check-build-system CMakeFiles/Makefile.cmake 0
/z/env_init/env_tvm/lib/python3.8/site-packages/cmake/data/bin/cmake -E cmake_progress_start /z/Dev/mlc-llm/build/CMakeFiles /z/Dev/mlc-llm/build//CMakeFiles/progress.marks
make  -f CMakeFiles/Makefile2 all
make[1]: Entering directory '/z/Dev/mlc-llm/build'
make  -f CMakeFiles/mlc_llm_objs.dir/build.make CMakeFiles/mlc_llm_objs.dir/depend
make[2]: Entering directory '/z/Dev/mlc-llm/build'
cd /z/Dev/mlc-llm/build && /z/env_init/env_tvm/lib/python3.8/site-packages/cmake/data/bin/cmake -E cmake_depends "Unix Makefiles" /z/Dev/mlc-llm /z/Dev/mlc-llm /z/Dev/mlc-llm/build /z/Dev/mlc-llm/build /z/Dev/mlc-llm/build/CMakeFiles/mlc_llm_objs.dir/DependInfo.cmake --color=
make[2]: Leaving directory '/z/Dev/mlc-llm/build'
make  -f CMakeFiles/mlc_llm_objs.dir/build.make CMakeFiles/mlc_llm_objs.dir/build
make[2]: Entering directory '/z/Dev/mlc-llm/build'
[  0%] Building CXX object CMakeFiles/mlc_llm_objs.dir/cpp/llm_chat.cc.o
/usr/bin/c++ -DDMLC_USE_LOGGING_LIBRARY="<tvm/runtime/logging.h>" -DMLC_LLM_EXPORTS -I/z/Dev/relax/include -I/z/Dev/relax/3rdparty/dlpack/include -I/z/Dev/relax/3rdparty/dmlc-core/include -I/z/Dev/mlc-llm/3rdparty/sentencepiece-js/sentencepiece/src -I/z/Dev/mlc-llm/3rdparty/tokenizers-cpp -std=c++17  -O2 -g -DNDEBUG -fPIC -MD -MT CMakeFiles/mlc_llm_objs.dir/cpp/llm_chat.cc.o -MF CMakeFiles/mlc_llm_objs.dir/cpp/llm_chat.cc.o.d -o CMakeFiles/mlc_llm_objs.dir/cpp/llm_chat.cc.o -c /z/Dev/mlc-llm/cpp/llm_chat.cc
make[2]: Leaving directory '/z/Dev/mlc-llm/build'
[  0%] Built target mlc_llm_objs
make  -f CMakeFiles/mlc_cli_objs.dir/build.make CMakeFiles/mlc_cli_objs.dir/depend
make[2]: Entering directory '/z/Dev/mlc-llm/build'
cd /z/Dev/mlc-llm/build && /z/env_init/env_tvm/lib/python3.8/site-packages/cmake/data/bin/cmake -E cmake_depends "Unix Makefiles" /z/Dev/mlc-llm /z/Dev/mlc-llm /z/Dev/mlc-llm/build /z/Dev/mlc-llm/build /z/Dev/mlc-llm/build/CMakeFiles/mlc_cli_objs.dir/DependInfo.cmake --color=
make[2]: Leaving directory '/z/Dev/mlc-llm/build'
make  -f CMakeFiles/mlc_cli_objs.dir/build.make CMakeFiles/mlc_cli_objs.dir/build
make[2]: Entering directory '/z/Dev/mlc-llm/build'
[  0%] Building CXX object CMakeFiles/mlc_cli_objs.dir/cpp/cli_main.cc.o
/usr/bin/c++ -DDMLC_USE_LOGGING_LIBRARY="<tvm/runtime/logging.h>" -I/z/Dev/relax/include -I/z/Dev/relax/3rdparty/dlpack/include -I/z/Dev/relax/3rdparty/dmlc-core/include -I/z/Dev/mlc-llm/3rdparty/argparse/include -std=c++17  -O2 -g -DNDEBUG -fPIC -MD -MT CMakeFiles/mlc_cli_objs.dir/cpp/cli_main.cc.o -MF CMakeFiles/mlc_cli_objs.dir/cpp/cli_main.cc.o.d -o CMakeFiles/mlc_cli_objs.dir/cpp/cli_main.cc.o -c /z/Dev/mlc-llm/cpp/cli_main.cc
make[2]: Leaving directory '/z/Dev/mlc-llm/build'
[  0%] Built target mlc_cli_objs
make  -f tokenizers/CMakeFiles/tokenizers.dir/build.make tokenizers/CMakeFiles/tokenizers.dir/depend
make[2]: Entering directory '/z/Dev/mlc-llm/build'
cd /z/Dev/mlc-llm/build && /z/env_init/env_tvm/lib/python3.8/site-packages/cmake/data/bin/cmake -E cmake_depends "Unix Makefiles" /z/Dev/mlc-llm /z/Dev/mlc-llm/3rdparty/tokenizers-cpp /z/Dev/mlc-llm/build /z/Dev/mlc-llm/build/tokenizers /z/Dev/mlc-llm/build/tokenizers/CMakeFiles/tokenizers.dir/DependInfo.cmake --color=
make[2]: Leaving directory '/z/Dev/mlc-llm/build'
make  -f tokenizers/CMakeFiles/tokenizers.dir/build.make tokenizers/CMakeFiles/tokenizers.dir/build
make[2]: Entering directory '/z/Dev/mlc-llm/build'
[  0%] Generating release/libtokenizers_cpp.a
cd /z/Dev/mlc-llm/3rdparty/tokenizers-cpp && /z/env_init/env_tvm/lib/python3.8/site-packages/cmake/data/bin/cmake -E env CARGO_TARGET_DIR=/z/Dev/mlc-llm/build/tokenizers RUSTFLAGS="" cargo build --release
No such file or directory
make[2]: *** [tokenizers/CMakeFiles/tokenizers.dir/build.make:74: tokenizers/release/libtokenizers_cpp.a] Error 1
make[2]: Leaving directory '/z/Dev/mlc-llm/build'
make[1]: *** [CMakeFiles/Makefile2:677: tokenizers/CMakeFiles/tokenizers.dir/all] Error 2
make[1]: Leaving directory '/z/Dev/mlc-llm/build'
make: *** [Makefile:159: all] Error 2

(env_tvm) root@12800db2b9db:/z/Dev/mlc-llm/3rdparty# ls
argparse  sentencepiece-js  tokenizers-cpp
(env_tvm) root@12800db2b9db:/z/Dev/mlc-llm/3rdparty# tree tokenizers-cpp/
tokenizers-cpp/
|-- CMakeLists.txt
|-- Cargo.toml
|-- src
|   `-- lib.rs
`-- tokenizers.h

1 directory, 4 files
(env_tvm) root@12800db2b9db:/z/Dev/mlc-llm/3rdparty/tokenizers-cpp# ls /z/Dev/mlc-llm/build/tokenizers
CMakeFiles  Makefile  cmake_install.cmake

It seems error happened when compiling tokenizers-cpp. I have verified that he two paths /z/Dev/mlc-llm/3rdparty/tokenizers-cpp and /z/Dev/mlc-llm/build/tokenizers are accessable.

yzh119 commented 1 year ago

The compilation of tokenizer-cpp depends on rust, can you confirm you have installed rust?

zhaoyang-star commented 1 year ago

The compilation of tokenizer-cpp depends on rust, can you confirm you have installed rust?

Yes, compilation of tokenizer-cpp depends on Rust. Rust is not installed on my device. I will install rust firstly. Thanks for your kind help.

zhaoyang-star commented 1 year ago

Rust is installed now.

# which rustc
/usr/bin/rustc
# rustc --version
rustc 1.65.0
# cd build; cmake .. -DCMAKE_VERBOSE_MAKEFILE=ON; make

The same error occured.

I tried to compile tokenizers-cpp alone. Also, error happened when generating release/libtokenizers_cpp.a. How you compile the tokenizers-cpp? @yzh119

# cd 3rdparty/tokenizers-cpp/
# mkdir build; cd build
# cmake .. .. -DCMAKE_VERBOSE_MAKEFILE=ON
# make
/z/env_init/env_tvm/lib/python3.8/site-packages/cmake/data/bin/cmake -S/z/Dev/mlc-llm/3rdparty/tokenizers-cpp -B/z/Dev/mlc-llm/3rdparty/tokenizers-cpp/build --check-build-system CMakeFiles/Makefile.cmake 0
/z/env_init/env_tvm/lib/python3.8/site-packages/cmake/data/bin/cmake -E cmake_progress_start /z/Dev/mlc-llm/3rdparty/tokenizers-cpp/build/CMakeFiles /z/Dev/mlc-llm/3rdparty/tokenizers-cpp/build//CMakeFiles/progress.marks
make  -f CMakeFiles/Makefile2 all
make[1]: Entering directory '/z/Dev/mlc-llm/3rdparty/tokenizers-cpp/build'
make  -f CMakeFiles/tokenizers.dir/build.make CMakeFiles/tokenizers.dir/depend
make[2]: Entering directory '/z/Dev/mlc-llm/3rdparty/tokenizers-cpp/build'
cd /z/Dev/mlc-llm/3rdparty/tokenizers-cpp/build && /z/env_init/env_tvm/lib/python3.8/site-packages/cmake/data/bin/cmake -E cmake_depends "Unix Makefiles" /z/Dev/mlc-llm/3rdparty/tokenizers-cpp /z/Dev/mlc-llm/3rdparty/tokenizers-cpp /z/Dev/mlc-llm/3rdparty/tokenizers-cpp/build /z/Dev/mlc-llm/3rdparty/tokenizers-cpp/build /z/Dev/mlc-llm/3rdparty/tokenizers-cpp/build/CMakeFiles/tokenizers.dir/DependInfo.cmake --color=
make[2]: Leaving directory '/z/Dev/mlc-llm/3rdparty/tokenizers-cpp/build'
make  -f CMakeFiles/tokenizers.dir/build.make CMakeFiles/tokenizers.dir/build
make[2]: Entering directory '/z/Dev/mlc-llm/3rdparty/tokenizers-cpp/build'
[100%] Generating release/libtokenizers_cpp.a
cd /z/Dev/mlc-llm/3rdparty/tokenizers-cpp && /z/env_init/env_tvm/lib/python3.8/site-packages/cmake/data/bin/cmake -E env CARGO_TARGET_DIR=/z/Dev/mlc-llm/3rdparty/tokenizers-cpp/build RUSTFLAGS="" cargo build --release
No such file or directory
make[2]: *** [CMakeFiles/tokenizers.dir/build.make:74: release/libtokenizers_cpp.a] Error 1
make[2]: Leaving directory '/z/Dev/mlc-llm/3rdparty/tokenizers-cpp/build'
make[1]: *** [CMakeFiles/Makefile2:86: CMakeFiles/tokenizers.dir/all] Error 2
make[1]: Leaving directory '/z/Dev/mlc-llm/3rdparty/tokenizers-cpp/build'
make: *** [Makefile:94: all] Error 2
yzh119 commented 1 year ago

@zhaoyang-star We rely on cargo: rust's package manager, not only rustc.

zhaoyang-star commented 1 year ago

I finally build mlc_chat_cli and libmlc_llm.so after installing Rust dev environment.

It came error that libtvm_runtime.so has not compiled with cuda runtime when I ran mlc_chat_cli. I am sure USE_CUDA, USE_CUDNN, USE_CUBLAS and USE_LLVM are ON when compiling TVM. /z/Dev/relax/ is TVM_HOME env var. The unittest test_cudnn.py also passed.

@yzh119 Could you please have a look at it? Is there something I still missing? Thanks a lot.

(env_tvm) root@12800db2b9db:/z/Dev/mlc-llm/build# ls
CMakeCache.txt  CPackConfig.cmake        Makefile             cmake_install.cmake  libmlc_llm.so  sentencepiece  tvm
CMakeFiles      CPackSourceConfig.cmake  TVMBuildOptions.txt  libmlc_llm.a         mlc_chat_cli   tokenizers
(env_tvm) root@12800db2b9db:/z/Dev/mlc-llm/build# ./mlc_chat_cli --device-name=cuda --artifact-path=/z/Dev/dist/
Use lib /z/Dev/dist/vicuna-v1-7b/float16/vicuna-v1-7b_cuda_float16.so
[00:00:37] /z/Dev/relax/src/runtime/library_module.cc:126: Binary was created using {cuda} but a loader of that name is not registered. Available loaders are VMExecutable, relax.Executable, metadata, const_loader, metadata_module. Perhaps you need to recompile with this runtime enabled.
Stack trace:
  [bt] (0) /z/Dev/mlc-llm/build/tvm/libtvm_runtime.so(tvm::runtime::Backtrace[abi:cxx11]()+0x2c) [0x7fa7cbce2efc]
  [bt] (1) ./mlc_chat_cli(tvm::runtime::detail::LogFatal::Entry::Finalize()+0x45) [0x55d806d97d61]
  [bt] (2) /z/Dev/mlc-llm/build/tvm/libtvm_runtime.so(tvm::runtime::LoadModuleFromBinary(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, dmlc::Stream*)+0x3d3) [0x7fa7cbce0013]
  [bt] (3) /z/Dev/mlc-llm/build/tvm/libtvm_runtime.so(tvm::runtime::ProcessModuleBlob(char const*, tvm::runtime::ObjectPtr<tvm::runtime::Library>, std::function<tvm::runtime::PackedFunc (int (*)(TVMValue*, int*, int, TVMValue*, int*, void*), tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)>, tvm::runtime::Module*, tvm::runtime::ModuleNode**)+0x590) [0x7fa7cbce06a0]
  [bt] (4) /z/Dev/mlc-llm/build/tvm/libtvm_runtime.so(tvm::runtime::CreateModuleFromLibrary(tvm::runtime::ObjectPtr<tvm::runtime::Library>, std::function<tvm::runtime::PackedFunc (int (*)(TVMValue*, int*, int, TVMValue*, int*, void*), tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)>)+0x221) [0x7fa7cbce1601]
  [bt] (5) /z/Dev/mlc-llm/build/tvm/libtvm_runtime.so(+0xcd71f) [0x7fa7cbccd71f]
  [bt] (6) /z/Dev/mlc-llm/build/tvm/libtvm_runtime.so(tvm::runtime::Module::LoadFromFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x20e) [0x7fa7cbcec26e]
  [bt] (7) ./mlc_chat_cli(+0x845c) [0x55d806d9945c]
  [bt] (8) /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7fa7cba32083]
tqchen commented 1 year ago

You need to set USE_CUDA=ON when compiling mlc_llm

zhaoyang-star commented 1 year ago

Thank @tqchen for your kind help. Env:

(env_tvm) root@12800db2b9db:/z/Dev/mlc-llm/build# ./mlc_chat_cli --device-name=cuda --artifact-path=/z/Dev/dist/ --evaluate
Use lib /z/Dev/dist/vicuna-v1-7b/float16/vicuna-v1-7b_cuda_float16.so
Initializing the chat module...
Finish loading
You can use the following special commands:
  /help    print the special commands
  /exit    quit the cli
  /stats   print out the latest stats (token/sec)
  /reset   restart a fresh chat

[18:41:39] /z/Dev/mlc-llm/cpp/llm_chat.cc:749: logits[:10] =[-7.34375, -6.17969, 5.78125, -1.62012, -3.20312, -2.6543, -0.955566, -4.88672, -4.14844, -1.96777]
[18:41:39] /z/Dev/mlc-llm/cpp/llm_chat.cc:754: encoding-time=527.079ms, decoding-time=36.8855ms.
(env_tvm) root@12800db2b9db:/z/Dev/mlc-llm/build# ./mlc_chat_cli --device-name=cuda --artifact-path=/z/Dev/dist/
Use lib /z/Dev/dist/vicuna-v1-7b/float16/vicuna-v1-7b_cuda_float16.so
Initializing the chat module...
Finish loading
You can use the following special commands:
  /help    print the special commands
  /exit    quit the cli
  /stats   print out the latest stats (token/sec)
  /reset   restart a fresh chat

USER: Who is Lionel Messi?                    
ASSISTANT: Lionel Messi is a professional soccer player who was born on June 24, 1987, in Rosario, Argentina. He is widely considered to be one of the greatest soccer players of all time. Messi grew up in a family of soccer players and began playing the sport at a young age. He eventually joined the youth academy of Spanish club Barcelona, where he made his professional debut at the age of 17. Since then, Messi has established himself as one of the most talented and skilled players in the world, winning numerous accolades and helping Barcelona to numerous championships. He is known for his exceptional speed, agility, and ball control, as well as his ability to score goals and create opportunities for his teammates. Off the field, Messi is known for his charitable efforts and his commitment to promoting the sport of soccer in his home country of Argentina.
USER: /stats
encode: 60.9 tok/s, decode: 21.1 tok/s
sleepwalker2017 commented 1 year ago

OMG it takes a lot to build the project ok, so why there is no readme???

Poordeveloper commented 1 year ago

:(

njuhang commented 1 year ago

You need to set USE_CUDA=ON when compiling mlc_llm

What if I don't have a cuda device on my computer? Thanks.

junrushao commented 1 year ago

Please check out this page for building mlc_chat_cli: https://mlc.ai/mlc-llm/docs/tutorials/runtime/cpp.html

Kuchiriel commented 6 months ago

Please check out this page for building mlc_chat_cli: https://mlc.ai/mlc-llm/docs/tutorials/runtime/cpp.html

Page not found

tqchen commented 6 months ago

The cli is now deprecated in the new version, checkout https://llm.mlc.ai/docs/deploy/cli.html for latest instruction