[Bug] TVMError: Check failed: (config["eos_token_id"].is<int64_t>()) is false:

sjtu-scx commented 7 months ago

🐛 Bug TVMError: Check failed: (config["eos_token_id"].is()) is false:

When I compile qwen1.5-7B-Chat with chatml template, there is no problem with the compilation process, but when I call it, the following error appears： [2024-03-11 20:35:35] INFO auto_device.py:85: Not found device: cuda:0 [2024-03-11 20:35:35] INFO auto_device.py:85: Not found device: rocm:0 [2024-03-11 20:35:36] INFO auto_device.py:85: Not found device: metal:0 [2024-03-11 20:35:40] INFO auto_device.py:76: Found device: vulkan:0 [2024-03-11 20:35:41] INFO auto_device.py:85: Not found device: opencl:0 [2024-03-11 20:35:41] INFO auto_device.py:33: Using device: vulkan:0 [2024-03-11 20:35:41] INFO chat_module.py:373: Using model folder: C:\Users\sunchenxing\Desktop\mlc_new\dist\qwen1.5-7b-chat-q4f16_1-MLC [2024-03-11 20:35:41] INFO chat_module.py:374: Using mlc chat config: C:\Users\sunchenxing\Desktop\mlc_new\dist\qwen1.5-7b-chat-q4f16_1-MLC\mlc-chat-config.json [2024-03-11 20:35:41] INFO chat_module.py:516: Using library model: dist/libs/qwen1.5-7b-chat-q4f16_1-vulkan.dll [2024-03-11 20:35:42] INFO model_metadata.py:96: Total memory usage: 5058.70 MB (Parameters: 4142.95 MB. KVCache: 384.00 MB. Temporary buffer: 531.75 MB) [2024-03-11 20:35:42] INFO model_metadata.py:105: To reduce memory usage, tweak prefill_chunk_size, context_window_size and sliding_window_size Traceback (most recent call last): File "C:\Users\sunchenxing\Desktop\mlc_new\test.py", line 5, in cm = ChatModule( File "C:\Users\sunchenxing.conda\envs\mlc\lib\site-packages\mlc_chat\chat_module.py", line 783, in init self._reload(self.model_lib_path, self.model_path, user_chat_config_json_str) File "C:\Users\sunchenxing.conda\envs\mlc\lib\site-packages\mlc_chat\chat_module.py", line 1002, in _reload self._reload_func(lib, model_path, app_config_json) File "C:\Users\sunchenxing.conda\envs\mlc\lib\site-packages\tvm_ffi_ctypes\packed_func.py", line 239, in call raise_last_ffi_error() File "C:\Users\sunchenxing.conda\envs\mlc\lib\site-packages\tvm_ffi\base.py", line 481, in raise_last_ffi_error raise py_err tvm._ffi.base.TVMError: Traceback (most recent call last): File "D:\a\package\package\mlc-llm\cpp\llm_chat.cc", line 574 TVMError: Check failed: (config["eos_token_id"].is()) is false:

Environment

Platform (e.g. WebGPU/Vulkan/IOS/Android/CUDA): Vulkan
Operating system (e.g. Ubuntu/Windows/MacOS/...): Windows
Device (e.g. iPhone 12 Pro, PC+RTX 3090, ...): PC+RTX 3060
How you installed MLC-LLM (conda, source): pip
How you installed TVM-Unity (pip, source): pip
Python version (e.g. 3.10): 3.10
GPU driver version (if applicable):
CUDA/cuDNN version (if applicable):
TVM Unity Hash Tag (python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))", applicable if you compile models): USE_NVTX: OFF
USE_GTEST: AUTO SUMMARIZE: OFF USE_IOS_RPC: OFF USE_MSC: OFF USE_ETHOSU: CUDA_VERSION: NOT-FOUND USE_LIBBACKTRACE: AUTO DLPACK_PATH: 3rdparty/dlpack/include USE_TENSORRT_CODEGEN: OFF USE_THRUST: OFF USE_TARGET_ONNX: OFF USE_AOT_EXECUTOR: ON BUILD_DUMMY_LIBTVM: OFF USE_CUDNN: OFF USE_TENSORRT_RUNTIME: OFF USE_ARM_COMPUTE_LIB_GRAPH_EXECUTOR: OFF USE_CCACHE: AUTO USE_ARM_COMPUTE_LIB: OFF USE_CPP_RTVM: USE_OPENCL_GTEST: /path/to/opencl/gtest USE_MKL: OFF USE_PT_TVMDSOOP: OFF MLIR_VERSION: NOT-FOUND USE_CLML: OFF USE_STACKVM_RUNTIME: OFF USE_GRAPH_EXECUTOR_CUDA_GRAPH: OFF ROCM_PATH: /opt/rocm USE_DNNL: OFF USE_VITIS_AI: OFF USE_MLIR: OFF USE_RCCL: OFF USE_LLVM: llvm-config --link-static USE_VERILATOR: OFF USE_TF_TVMDSOOP: OFF USE_THREADS: ON USE_MSVC_MT: OFF BACKTRACE_ON_SEGFAULT: OFF USE_GRAPH_EXECUTOR: ON USE_NCCL: OFF USE_ROCBLAS: OFF GIT_COMMIT_HASH: f06d486b4a1a27f0bbb072688a5fc41e7b15323c USE_VULKAN: ON USE_RUST_EXT: OFF USE_CUTLASS: OFF USE_CPP_RPC: OFF USE_HEXAGON: OFF USE_CUSTOM_LOGGING: OFF USE_UMA: OFF USE_FALLBACK_STL_MAP: OFF USE_SORT: ON USE_RTTI: ON GIT_COMMIT_TIME: 2024-03-08 02:04:22 -0500 USE_HEXAGON_SDK: /path/to/sdk USE_BLAS: none USE_ETHOSN: OFF USE_LIBTORCH: OFF USE_RANDOM: ON USE_CUDA: OFF USE_COREML: OFF USE_AMX: OFF BUILD_STATIC_RUNTIME: OFF USE_CMSISNN: OFF USE_KHRONOS_SPIRV: OFF USE_CLML_GRAPH_EXECUTOR: OFF USE_TFLITE: OFF USE_HEXAGON_GTEST: /path/to/hexagon/gtest PICOJSON_PATH: 3rdparty/picojson USE_OPENCL_ENABLE_HOST_PTR: OFF INSTALL_DEV: OFF USE_PROFILER: ON USE_NNPACK: OFF LLVM_VERSION: 17.0.6 USE_MRVL: OFF USE_OPENCL: OFF COMPILER_RT_PATH: 3rdparty/compiler-rt RANG_PATH: 3rdparty/rang/include USE_SPIRV_KHR_INTEGER_DOT_PRODUCT: OFF USE_OPENMP: OFF USE_BNNS: OFF USE_CUBLAS: OFF USE_METAL: OFF USE_MICRO_STANDALONE_RUNTIME: OFF USE_HEXAGON_EXTERNAL_LIBS: OFF USE_ALTERNATIVE_LINKER: AUTO USE_BYODT_POSIT: OFF USE_HEXAGON_RPC: OFF USE_MICRO: OFF DMLC_PATH: 3rdparty/dmlc-core/include INDEX_DEFAULT_I64: ON USE_RELAY_DEBUG: OFF USE_RPC: ON USE_TENSORFLOW_PATH: none TVM_CLML_VERSION: USE_MIOPEN: OFF USE_ROCM: OFF USE_PAPI: OFF USE_CURAND: OFF TVM_CXX_COMPILER_PATH: C:/Program Files/Microsoft Visual Studio/2022/Enterprise/VC/Tools/MSVC/14.38.33130/bin/HostX64/x64/cl.exe HIDE_PRIVATE_SYMBOLS: OFF

MasterJH5574 commented 7 months ago

Thank you @sjtu-scx for reporting! The failure is due to we want to make sure that the eos_token_id is an integer in mlc_chat_config.json, but it turns out that in your case the eos_token_id is not. Could you help me check what the value of eos_token_id is in dist\qwen1.5-7b-chat-q4f16_1-MLC\mlc-chat-config.json?

sjtu-scx commented 7 months ago

Thanks for your patience in replying, here is my mlc-chat-config.json file, I found that eos_token_id is a list and not a value.

mlc-chat-config.json

{ "model_type": "qwen2", "quantization": "q4f16_1", "model_config": { "hidden_act": "silu", "hidden_size": 4096, "intermediate_size": 11008, "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 32, "rms_norm_eps": 1e-06, "rope_theta": 1000000.0, "vocab_size": 151936, "context_window_size": 768, "prefill_chunk_size": 768, "tensor_parallel_shards": 1, "dtype": "float32" }, "vocab_size": 151936, "context_window_size": 768, "sliding_window_size": -1, "prefill_chunk_size": 768, "attention_sink_size": -1, "tensor_parallel_shards": 1, "mean_gen_len": 128, "max_gen_len": 512, "shift_fill_factor": 0.3, "temperature": 0.7, "presence_penalty": 0.0, "frequency_penalty": 0.0, "repetition_penalty": 1.05, "top_p": 0.8, "conv_template": "chatml", "pad_token_id": 151643, "bos_token_id": 151643, "eos_token_id": [ 151645, 151643 ], "tokenizer_files": [ "tokenizer.json", "vocab.json", "merges.txt", "tokenizer_config.json" ], "version": "0.1.0" }

sjtu-scx commented 7 months ago

LumenScopeAI commented 7 months ago

你好，我遇到了一样的问题，也在尝试和你一样的模型，但是遇到了一个问题，请问app-config,json里面的model_lib和编译的时候的--conv-template是怎么设置的呢？

我设置：

    {
      "model_url": "",
      "model_lib": "qwen-2_q40f16",
      "estimated_vram_bytes": 4348727787,
      "model_id": "Qwen1.5-1.8B-Chat-q0f16"
    }

    mlc_chat gen_config ./dist/models/$MODEL_NAME/ --quantization $QUANTIZATION \
  --conv-template llama-2 --context-window-size 768 -o dist/${MODEL_NAME}-${QUANTIZATION}-MLC/

会出问题

sjtu-scx commented 7 months ago

你好，Qwen用的是chatml模版， --conv-template设置为chatml，不要用llama-2

LumenScopeAI commented 7 months ago

感谢感谢，请问model_lib的设置怎么做呢？qwen2_q40f16在部署的时候会出错

sjtu-scx commented 7 months ago

不客气，你是做什么端的部署遇到了问题呀，mlc_chat gen_config 之后下一步就是编译文件到对应的device上了，也是命令行操作的，不需要额外设置

LumenScopeAI commented 7 months ago

我在尝试安卓和IOS的部署，在生成apk的时候要指定一个model_lib，我遇到了和这里一样的问题https://github.com/mlc-ai/mlc-llm/issues/1517

sjtu-scx commented 7 months ago

我跟着这个流程做了一下https://github.com/Tao-begd/mlc-llm-android，不知道对你是否有帮助

sjtu-scx commented 7 months ago

我这边是这样设置的，把用不到的model删掉，然后添加自己的model并设置好路径

{ "model_list": [ { "model_url": "https://huggingface.co/mlc-ai/Llama-2-7b-chat-hf-q4f16_1-MLC/", "model_lib": "llama_q4f16_1", "estimated_vram_bytes": 4348727787, "model_id": "Llama-2-7b-chat-hf-q4f16_1" }

], "model_lib_path_for_prepare_libs": { "llama_q4f16_1": "Llama-2-7b-chat-hf-q4f16_1-MLC\Llama-2-7b-chat-hf-q4f16_1-android.tar" } } 希望对你有帮助~

tqchen commented 7 months ago

@MasterJH5574 maybe a good lessons is we should validate the generated mlc-chat-json for necessary field in gen_config.

MasterJH5574 commented 7 months ago

@sjtu-scx Thanks for sharing the config! Yes right now the ChatModule assumes the eos token id is a single token id, which does not hold for this case. We will work on a fix soon.

MasterJH5574 commented 7 months ago

Fixed here https://github.com/mlc-ai/mlc-llm/pull/1940 by removing the need of eos_token_ids. Please wait for 1-2 days for the Pypi wheel updates.

MrRace commented 7 months ago

@MasterJH5574 If I set the eos_token_id directly to a value, say 151645, instead of using the original list, and then recompile the tar file and package the apk file again, upon installing and running the qwen2 model on the phone, the entire system freezes and eventually crashes, requiring a phone restart. Have you encountered this issue before?

MasterJH5574 commented 7 months ago

@MrRace Thanks for the question. Do you mean the result is caused by only changing the eos_token_id?

Maybe we can follow up on this in a new issue. Also cc @Kartik14

MasterJH5574 commented 7 months ago

The original issue should have been resolved. Closing this issue for now.

MrRace commented 7 months ago

@MasterJH5574 Thanks a lot for your reply. What I mean is, if we simply change the original list value of eos_token_id to a single value, although it won't trigger the previous error: TVMError: Check failed: (config["eos_token_id"].is<int64_t>()) is false:, but when using the input box to input dialogue text, it will cause system crashes and reboots on the mobile phone.

mlc-ai / mlc-llm