mlc-ai / mlc-llm

Universal LLM Deployment Engine with ML Compilation
https://llm.mlc.ai/
Apache License 2.0
19.26k stars 1.58k forks source link

[Bug] Cannot run finetuned model of Mistral 7B with `mlc_llm convert_weights` with "data did not match any variant of untagged enum ModelWrapper" #3023

Closed pjyi2147 closed 1 week ago

pjyi2147 commented 1 week ago

πŸ› Bug

Hi, I am trying to integrate mlc_llm to my research project and having issues with running the model with mlc_llm.

Are finetuned models not supported yet?

To Reproduce

Steps to reproduce the behavior:

  1. Download castorini/rank_zephyr_7b_v1_full
  2. Convert weights of the model with the instruction: https://llm.mlc.ai/docs/compilation/convert_weights.html
  3. Run mlc_llm chat <converted_model_dir>
Command:
mlc_llm chat /home/tardis/shared/patrickyi/dist/models/rank_zephyr_7b_v1_full_mlc

Log file created at: 2024-11-12T20:39:09Z

Output:
[2024-11-12 20:39:11] INFO auto_device.py:79: Found device: cuda:0
[2024-11-12 20:39:13] INFO auto_device.py:88: Not found device: rocm:0
[2024-11-12 20:39:14] INFO auto_device.py:88: Not found device: metal:0
[2024-11-12 20:39:15] INFO auto_device.py:79: Found device: vulkan:0
[2024-11-12 20:39:17] INFO auto_device.py:88: Not found device: opencl:0
[2024-11-12 20:39:17] INFO auto_device.py:35: Using device: cuda:0
[2024-11-12 20:39:17] INFO jit.py:43: MLC_JIT_POLICY = ON. Can be one of: ON, OFF, REDO, READONLY
[2024-11-12 20:39:17] INFO jit.py:158: Using cached model lib: /home/tardis/shared/patrickyi/.cache/mlc_llm/model_lib/6b66c41e3aa7dc277d98a61171a53ffd.so
thread '<unnamed>' panicked at src/lib.rs:26:50:
called `Result::unwrap()` on an `Err` value: Error("data did not match any variant of untagged enum ModelWrapper", line: 268063, column: 1)
stack backtrace:
   0: rust_begin_unwind
   1: core::panicking::panic_fmt
   2: core::result::unwrap_failed
   3: tokenizers_new_from_str
   4: _ZN10tokenizers9Tokenizer12FromBlobJSONERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
             at /workspace/mlc-llm/3rdparty/tokenizers-cpp/src/huggingface_tokenizer.cc:108:63
   5: _ZN3mlc3llm9Tokenizer8FromPathERKN3tvm7runtime6StringESt8optionalINS0_13TokenizerInfoEE
             at /workspace/mlc-llm/cpp/tokenizers/tokenizers.cc:157:57
   6: operator()
             at /workspace/mlc-llm/cpp/tokenizers/tokenizers.cc:459:34
   7: run<tvm::runtime::TVMMovableArgValueWithContext_>
             at /workspace/mlc-llm/3rdparty/tvm/include/tvm/runtime/packed_func.h:1974:11
   8: run<>
             at /workspace/mlc-llm/3rdparty/tvm/include/tvm/runtime/packed_func.h:1959:60
   9: unpack_call<mlc::llm::Tokenizer, 1, mlc::llm::<lambda(const tvm::runtime::String&)> >
             at /workspace/mlc-llm/3rdparty/tvm/include/tvm/runtime/packed_func.h:1999:46
  10: operator()
             at /workspace/mlc-llm/3rdparty/tvm/include/tvm/runtime/packed_func.h:2059:44
  11: Call
             at /workspace/mlc-llm/3rdparty/tvm/include/tvm/runtime/packed_func.h:1394:58
  12: TVMFuncCall
  13: _ZL39__pyx_f_3tvm_4_ffi_4_cy3_4core_FuncCallPvP7_objectP8TVMValuePi
  14: _ZL76__pyx_pw_3tvm_4_ffi_4_cy3_4core_10ObjectBase_3__init_handle_by_constructor__P7_objectPKS0_lS0_
  15: _PyObject_VectorcallTstate
             at /usr/local/src/conda/python-3.10.13/Include/cpython/abstract.h:114:11
  16: method_vectorcall
             at /usr/local/src/conda/python-3.10.13/Objects/classobject.c:53:18
  17: _PyObject_VectorcallTstate
             at /usr/local/src/conda/python-3.10.13/Include/cpython/abstract.h:114:11
  18: PyObject_Vectorcall
             at /usr/local/src/conda/python-3.10.13/Include/cpython/abstract.h:123:12
  19: call_function
             at /usr/local/src/conda/python-3.10.13/Python/ceval.c:5893:13
  20: _PyEval_EvalFrameDefault
             at /usr/local/src/conda/python-3.10.13/Python/ceval.c:4181:23
  21: _PyEval_EvalFrame
             at /usr/local/src/conda/python-3.10.13/Include/internal/pycore_ceval.h:46:12
  22: _PyEval_Vector
             at /usr/local/src/conda/python-3.10.13/Python/ceval.c:5067:24
  23: _PyFunction_Vectorcall
             at /usr/local/src/conda/python-3.10.13/Objects/call.c:342:16
  24: _PyObject_FastCallDictTstate
             at /usr/local/src/conda/python-3.10.13/Objects/call.c:142:15
  25: _PyObject_Call_Prepend
             at /usr/local/src/conda/python-3.10.13/Objects/call.c:431:24
  26: slot_tp_init
             at /usr/local/src/conda/python-3.10.13/Objects/typeobject.c:7734:15
  27: type_call
             at /usr/local/src/conda/python-3.10.13/Objects/typeobject.c:1135:19
  28: _PyObject_MakeTpCall
             at /usr/local/src/conda/python-3.10.13/Objects/call.c:215:18
  29: _PyObject_VectorcallTstate
             at /usr/local/src/conda/python-3.10.13/Include/cpython/abstract.h:112:16
  30: _PyObject_VectorcallTstate
             at /usr/local/src/conda/python-3.10.13/Include/cpython/abstract.h:99:1
  31: PyObject_Vectorcall
             at /usr/local/src/conda/python-3.10.13/Include/cpython/abstract.h:123:12
  32: call_function
             at /usr/local/src/conda/python-3.10.13/Python/ceval.c:5893:13
  33: _PyEval_EvalFrameDefault
             at /usr/local/src/conda/python-3.10.13/Python/ceval.c:4213:19
  34: _PyEval_EvalFrame
             at /usr/local/src/conda/python-3.10.13/Include/internal/pycore_ceval.h:46:12
  35: _PyEval_Vector
             at /usr/local/src/conda/python-3.10.13/Python/ceval.c:5067:24
  36: _PyFunction_Vectorcall
             at /usr/local/src/conda/python-3.10.13/Objects/call.c:342:16
  37: _PyObject_FastCallDictTstate
             at /usr/local/src/conda/python-3.10.13/Objects/call.c:153:15
  38: _PyObject_Call_Prepend
             at /usr/local/src/conda/python-3.10.13/Objects/call.c:431:24
  39: slot_tp_init
             at /usr/local/src/conda/python-3.10.13/Objects/typeobject.c:7734:15
  40: type_call
             at /usr/local/src/conda/python-3.10.13/Objects/typeobject.c:1135:19
  41: _PyObject_MakeTpCall
             at /usr/local/src/conda/python-3.10.13/Objects/call.c:215:18
  42: _PyObject_VectorcallTstate
             at /usr/local/src/conda/python-3.10.13/Include/cpython/abstract.h:112:16
  43: _PyObject_VectorcallTstate
             at /usr/local/src/conda/python-3.10.13/Include/cpython/abstract.h:99:1
  44: PyObject_Vectorcall
             at /usr/local/src/conda/python-3.10.13/Include/cpython/abstract.h:123:12
  45: call_function
             at /usr/local/src/conda/python-3.10.13/Python/ceval.c:5893:13
  46: _PyEval_EvalFrameDefault
             at /usr/local/src/conda/python-3.10.13/Python/ceval.c:4231:19
  47: _PyEval_EvalFrame
             at /usr/local/src/conda/python-3.10.13/Include/internal/pycore_ceval.h:46:12
  48: _PyEval_Vector
             at /usr/local/src/conda/python-3.10.13/Python/ceval.c:5067:24
  49: _PyFunction_Vectorcall
             at /usr/local/src/conda/python-3.10.13/Objects/call.c:342:16
  50: _PyObject_VectorcallTstate
             at /usr/local/src/conda/python-3.10.13/Include/cpython/abstract.h:114:11
  51: PyObject_Vectorcall
             at /usr/local/src/conda/python-3.10.13/Include/cpython/abstract.h:123:12
  52: call_function
             at /usr/local/src/conda/python-3.10.13/Python/ceval.c:5893:13
  53: _PyEval_EvalFrameDefault
             at /usr/local/src/conda/python-3.10.13/Python/ceval.c:4231:19
  54: _PyEval_EvalFrame
             at /usr/local/src/conda/python-3.10.13/Include/internal/pycore_ceval.h:46:12
  55: _PyEval_Vector
             at /usr/local/src/conda/python-3.10.13/Python/ceval.c:5067:24
  56: _PyFunction_Vectorcall
             at /usr/local/src/conda/python-3.10.13/Objects/call.c:342:16
  57: _PyObject_VectorcallTstate
             at /usr/local/src/conda/python-3.10.13/Include/cpython/abstract.h:114:11
  58: PyObject_Vectorcall
             at /usr/local/src/conda/python-3.10.13/Include/cpython/abstract.h:123:12
  59: call_function
             at /usr/local/src/conda/python-3.10.13/Python/ceval.c:5893:13
  60: _PyEval_EvalFrameDefault
             at /usr/local/src/conda/python-3.10.13/Python/ceval.c:4181:23
  61: _PyEval_EvalFrame
             at /usr/local/src/conda/python-3.10.13/Include/internal/pycore_ceval.h:46:12
  62: _PyEval_Vector
             at /usr/local/src/conda/python-3.10.13/Python/ceval.c:5067:24
  63: _PyFunction_Vectorcall
             at /usr/local/src/conda/python-3.10.13/Objects/call.c:342:16
  64: _PyObject_VectorcallTstate
             at /usr/local/src/conda/python-3.10.13/Include/cpython/abstract.h:114:11
  65: PyObject_Vectorcall
             at /usr/local/src/conda/python-3.10.13/Include/cpython/abstract.h:123:12
  66: call_function
             at /usr/local/src/conda/python-3.10.13/Python/ceval.c:5893:13
  67: _PyEval_EvalFrameDefault
             at /usr/local/src/conda/python-3.10.13/Python/ceval.c:4213:19
  68: _PyEval_EvalFrame
             at /usr/local/src/conda/python-3.10.13/Include/internal/pycore_ceval.h:46:12
  69: _PyEval_Vector
             at /usr/local/src/conda/python-3.10.13/Python/ceval.c:5067:24
  70: PyEval_EvalCode
             at /usr/local/src/conda/python-3.10.13/Python/ceval.c:1134:12
  71: run_eval_code_obj
             at /usr/local/src/conda/python-3.10.13/Python/pythonrun.c:1291:9
  72: run_mod
             at /usr/local/src/conda/python-3.10.13/Python/pythonrun.c:1312:19
  73: pyrun_file
             at /usr/local/src/conda/python-3.10.13/Python/pythonrun.c:1208:15
  74: _PyRun_SimpleFileObject
             at /usr/local/src/conda/python-3.10.13/Python/pythonrun.c:456:13
  75: _PyRun_AnyFileObject
             at /usr/local/src/conda/python-3.10.13/Python/pythonrun.c:90:15
  76: pymain_run_file_obj
             at /usr/local/src/conda/python-3.10.13/Modules/main.c:357:15
  77: pymain_run_file
             at /usr/local/src/conda/python-3.10.13/Modules/main.c:376:15
  78: pymain_run_python
             at /usr/local/src/conda/python-3.10.13/Modules/main.c:591:21
  79: Py_RunMain
             at /usr/local/src/conda/python-3.10.13/Modules/main.c:670:5
  80: Py_BytesMain
             at /usr/local/src/conda/python-3.10.13/Modules/main.c:1090:12
  81: <unknown>
  82: __libc_start_main
  83: <unknown>
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
fatal runtime error: failed to initiate panic, error 5
/home/tardis/shared/patrickyi/log/log_command.sh: line 67: 2276887 Aborted                 (core dumped) $COMMAND

Expected behavior

mlc_llm chat server waiting for prompts

Environment

USE_NVTX: OFF USE_GTEST: AUTO SUMMARIZE: OFF TVM_DEBUG_WITH_ABI_CHANGE: OFF USE_IOS_RPC: OFF USE_MSC: OFF USE_ETHOSU: CUDA_VERSION: 12.2 USE_LIBBACKTRACE: AUTO DLPACK_PATH: 3rdparty/dlpack/include USE_TENSORRT_CODEGEN: OFF USE_THRUST: ON USE_TARGET_ONNX: OFF USE_AOT_EXECUTOR: ON BUILD_DUMMY_LIBTVM: OFF USE_CUDNN: OFF USE_TENSORRT_RUNTIME: OFF USE_ARM_COMPUTE_LIB_GRAPH_EXECUTOR: OFF USE_CCACHE: AUTO USE_ARM_COMPUTE_LIB: OFF USE_CPP_RTVM: USE_OPENCL_GTEST: /path/to/opencl/gtest TVM_LOG_BEFORE_THROW: OFF USE_MKL: OFF USE_PT_TVMDSOOP: OFF MLIR_VERSION: NOT-FOUND USE_CLML: OFF USE_STACKVM_RUNTIME: OFF USE_GRAPH_EXECUTOR_CUDA_GRAPH: OFF ROCM_PATH: /opt/rocm USE_DNNL: OFF USE_MSCCL: OFF USE_NNAPI_RUNTIME: OFF USE_VITIS_AI: OFF USE_MLIR: OFF USE_RCCL: OFF USE_LLVM: llvm-config --ignore-libllvm --link-static USE_VERILATOR: OFF USE_TF_TVMDSOOP: OFF USE_THREADS: ON USE_MSVC_MT: OFF BACKTRACE_ON_SEGFAULT: OFF USE_GRAPH_EXECUTOR: ON USE_NCCL: ON USE_ROCBLAS: OFF GIT_COMMIT_HASH: 79a69ae4a92c9d4f23e62f93ce5b0d90ed29e5ed USE_VULKAN: ON USE_RUST_EXT: OFF USE_CUTLASS: ON USE_CPP_RPC: OFF USE_HEXAGON: OFF USE_CUSTOM_LOGGING: OFF USE_UMA: OFF USE_FALLBACK_STL_MAP: OFF USE_SORT: ON USE_RTTI: ON GIT_COMMIT_TIME: 2024-11-11 00:56:50 -0500 USE_HIPBLAS: OFF USE_HEXAGON_SDK: /path/to/sdk USE_BLAS: none USE_ETHOSN: OFF USE_LIBTORCH: OFF USE_RANDOM: ON USE_CUDA: ON USE_COREML: OFF USE_AMX: OFF BUILD_STATIC_RUNTIME: OFF USE_CMSISNN: OFF USE_KHRONOS_SPIRV: OFF USE_CLML_GRAPH_EXECUTOR: OFF USE_TFLITE: OFF USE_HEXAGON_GTEST: /path/to/hexagon/gtest PICOJSON_PATH: 3rdparty/picojson USE_OPENCL_ENABLE_HOST_PTR: OFF INSTALL_DEV: OFF USE_PROFILER: ON USE_NNPACK: OFF LLVM_VERSION: 17.0.6 USE_MRVL: OFF USE_OPENCL: OFF COMPILER_RT_PATH: 3rdparty/compiler-rt USE_NNAPI_CODEGEN: OFF RANG_PATH: 3rdparty/rang/include USE_SPIRV_KHR_INTEGER_DOT_PRODUCT: OFF USE_OPENMP: OFF USE_BNNS: OFF USE_FLASHINFER: ON USE_CUBLAS: ON USE_METAL: OFF USE_MICRO_STANDALONE_RUNTIME: OFF USE_HEXAGON_EXTERNAL_LIBS: OFF USE_ALTERNATIVE_LINKER: AUTO USE_BYODT_POSIT: OFF USE_NVSHMEM: OFF USE_HEXAGON_RPC: OFF USE_MICRO: OFF DMLC_PATH: 3rdparty/dmlc-core/include INDEX_DEFAULT_I64: ON USE_RELAY_DEBUG: OFF USE_RPC: ON USE_TENSORFLOW_PATH: none TVM_CLML_VERSION: USE_MIOPEN: OFF USE_ROCM: OFF USE_PAPI: OFF USE_CURAND: OFF TVM_CXX_COMPILER_PATH: /opt/rh/gcc-toolset-11/root/usr/bin/c++ HIDE_PRIVATE_SYMBOLS: ON

Additional context

pjyi2147 commented 1 week ago

Maybe something similar to this issue? https://github.com/mlc-ai/mlc-llm/issues/2447

MasterJH5574 commented 1 week ago

Hi @pjyi2147, we can support this model and the issue is not from tokenizer. May I ask what the tokenizers version is on your end? We need it to be at least 0.19.1 to work well:

> pip list | grep "tokenizers"
tokenizers                    0.19.1

So please update your tokenizers package to the latest if you find it older than 0.19.1.

Besides, we indeed fixed a bug (not related to this issue) in #3026 for sliding window. So please update to the latest nightly package tomorrow or check out the latest codebase to address that bug. Thanks.

pjyi2147 commented 1 week ago

Hello @MasterJH5574. I just checked my tokenizers package version is 0.20.1. I will try the whole process again this weekend to see any changes with the error

MasterJH5574 commented 1 week ago

@pjyi2147 I see. Then I guess it's not the Python package but the Rust package. Do you build mlc-llm from source? (I assume so?) If true, maybe we need to check the Rust tokenizers package version by

> cd 3rdparty/tokenizers-cpp/rust
> cargo check --package tokenizers
    ...
    Checking tokenizers v0.19.1
    Finished dev [unoptimized + debuginfo] target(s) in 6.17s
MasterJH5574 commented 1 week ago

Because the package requirement is from the Rust side https://github.com/mlc-ai/tokenizers-cpp/blob/main/rust/Cargo.toml#L11

pjyi2147 commented 1 week ago

@MasterJH5574 I did not install mlc-llm from source, but installed via pip. My current version of mlc-llm from pip is the following:

mlc-ai-nightly-cu122              0.18.dev226
mlc-llm-nightly-cu122             0.18.dev61

Is there any more information you would need to investigate further?

MasterJH5574 commented 1 week ago

@pjyi2147 Thank you for sharing this information. It's very helpful. We'll dig deeper to see what's going on.

MasterJH5574 commented 1 week ago

Hi @pjyi2147 we have fixed the issue. It turns out that 0.19.3 is also too old to run the rank_zephyr model. We've bumped to 0.20.3 and please update the mlc python package and try again, thanks!

pjyi2147 commented 1 week ago

@MasterJH5574 Do you mean 0.20.3 for the version of the tokenizers package?

MasterJH5574 commented 1 week ago

@pjyi2147 Yes. We've done that here https://github.com/mlc-ai/tokenizers-cpp/commit/4bb753377680e249345b54c6b10e6d0674c8af03. No other action is needed for your side but upgrading the mlc Python package.

pjyi2147 commented 1 week ago

Great! I will run the process again over the weekend and update.

pjyi2147 commented 1 week ago

I updated and it works!

My current versions are

mlc-ai-nightly-cu122      0.18.dev246              pypi_0    pypi
mlc-llm-nightly-cu122     0.18.dev69               pypi_0    pypi