mlc-ai / mlc-llm

Universal LLM Deployment Engine with ML Compilation
https://llm.mlc.ai/
Apache License 2.0
19.09k stars 1.56k forks source link

[Bug] TVMError: Check failed: (config["eos_token_id"].is<int64_t>()) is false: #1924

Closed sjtu-scx closed 7 months ago

sjtu-scx commented 7 months ago

🐛 Bug TVMError: Check failed: (config["eos_token_id"].is()) is false:

When I compile qwen1.5-7B-Chat with chatml template, there is no problem with the compilation process, but when I call it, the following error appears: [2024-03-11 20:35:35] INFO auto_device.py:85: Not found device: cuda:0 [2024-03-11 20:35:35] INFO auto_device.py:85: Not found device: rocm:0 [2024-03-11 20:35:36] INFO auto_device.py:85: Not found device: metal:0 [2024-03-11 20:35:40] INFO auto_device.py:76: Found device: vulkan:0 [2024-03-11 20:35:41] INFO auto_device.py:85: Not found device: opencl:0 [2024-03-11 20:35:41] INFO auto_device.py:33: Using device: vulkan:0 [2024-03-11 20:35:41] INFO chat_module.py:373: Using model folder: C:\Users\sunchenxing\Desktop\mlc_new\dist\qwen1.5-7b-chat-q4f16_1-MLC [2024-03-11 20:35:41] INFO chat_module.py:374: Using mlc chat config: C:\Users\sunchenxing\Desktop\mlc_new\dist\qwen1.5-7b-chat-q4f16_1-MLC\mlc-chat-config.json [2024-03-11 20:35:41] INFO chat_module.py:516: Using library model: dist/libs/qwen1.5-7b-chat-q4f16_1-vulkan.dll [2024-03-11 20:35:42] INFO model_metadata.py:96: Total memory usage: 5058.70 MB (Parameters: 4142.95 MB. KVCache: 384.00 MB. Temporary buffer: 531.75 MB) [2024-03-11 20:35:42] INFO model_metadata.py:105: To reduce memory usage, tweak prefill_chunk_size, context_window_size and sliding_window_size Traceback (most recent call last): File "C:\Users\sunchenxing\Desktop\mlc_new\test.py", line 5, in cm = ChatModule( File "C:\Users\sunchenxing.conda\envs\mlc\lib\site-packages\mlc_chat\chat_module.py", line 783, in init self._reload(self.model_lib_path, self.model_path, user_chat_config_json_str) File "C:\Users\sunchenxing.conda\envs\mlc\lib\site-packages\mlc_chat\chat_module.py", line 1002, in _reload self._reload_func(lib, model_path, app_config_json) File "C:\Users\sunchenxing.conda\envs\mlc\lib\site-packages\tvm_ffi_ctypes\packed_func.py", line 239, in call raise_last_ffi_error() File "C:\Users\sunchenxing.conda\envs\mlc\lib\site-packages\tvm_ffi\base.py", line 481, in raise_last_ffi_error raise py_err tvm._ffi.base.TVMError: Traceback (most recent call last): File "D:\a\package\package\mlc-llm\cpp\llm_chat.cc", line 574 TVMError: Check failed: (config["eos_token_id"].is()) is false:

Environment

MasterJH5574 commented 7 months ago

Thank you @sjtu-scx for reporting! The failure is due to we want to make sure that the eos_token_id is an integer in mlc_chat_config.json, but it turns out that in your case the eos_token_id is not. Could you help me check what the value of eos_token_id is in dist\qwen1.5-7b-chat-q4f16_1-MLC\mlc-chat-config.json?

sjtu-scx commented 7 months ago

Thanks for your patience in replying, here is my mlc-chat-config.json file, I found that eos_token_id is a list and not a value.

mlc-chat-config.json

{ "model_type": "qwen2", "quantization": "q4f16_1", "model_config": { "hidden_act": "silu", "hidden_size": 4096, "intermediate_size": 11008, "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 32, "rms_norm_eps": 1e-06, "rope_theta": 1000000.0, "vocab_size": 151936, "context_window_size": 768, "prefill_chunk_size": 768, "tensor_parallel_shards": 1, "dtype": "float32" }, "vocab_size": 151936, "context_window_size": 768, "sliding_window_size": -1, "prefill_chunk_size": 768, "attention_sink_size": -1, "tensor_parallel_shards": 1, "mean_gen_len": 128, "max_gen_len": 512, "shift_fill_factor": 0.3, "temperature": 0.7, "presence_penalty": 0.0, "frequency_penalty": 0.0, "repetition_penalty": 1.05, "top_p": 0.8, "conv_template": "chatml", "pad_token_id": 151643, "bos_token_id": 151643, "eos_token_id": [ 151645, 151643 ], "tokenizer_files": [ "tokenizer.json", "vocab.json", "merges.txt", "tokenizer_config.json" ], "version": "0.1.0" }

sjtu-scx commented 7 months ago

Thank you @sjtu-scx for reporting! The failure is due to we want to make sure that the eos_token_id is an integer in mlc_chat_config.json, but it turns out that in your case the eos_token_id is not. Could you help me check what the value of eos_token_id is in dist\qwen1.5-7b-chat-q4f16_1-MLC\mlc-chat-config.json? When I change the eos_token_id directly to 151645,the error disappears.

LumenScopeAI commented 7 months ago

Thank you @sjtu-scx for reporting! The failure is due to we want to make sure that the eos_token_id is an integer in mlc_chat_config.json, but it turns out that in your case the eos_token_id is not. Could you help me check what the value of eos_token_id is in dist\qwen1.5-7b-chat-q4f16_1-MLC\mlc-chat-config.json? When I change the eos_token_id directly to 151645,the error disappears.

你好,我遇到了一样的问题,也在尝试和你一样的模型,但是遇到了一个问题,请问app-config,json里面的model_lib和编译的时候的--conv-template是怎么设置的呢?

我设置:

    {
      "model_url": "",
      "model_lib": "qwen-2_q40f16",
      "estimated_vram_bytes": 4348727787,
      "model_id": "Qwen1.5-1.8B-Chat-q0f16"
    }

    mlc_chat gen_config ./dist/models/$MODEL_NAME/ --quantization $QUANTIZATION \
  --conv-template llama-2 --context-window-size 768 -o dist/${MODEL_NAME}-${QUANTIZATION}-MLC/

会出问题

sjtu-scx commented 7 months ago

Thank you @sjtu-scx for reporting! The failure is due to we want to make sure that the eos_token_id is an integer in mlc_chat_config.json, but it turns out that in your case the eos_token_id is not. Could you help me check what the value of eos_token_id is in dist\qwen1.5-7b-chat-q4f16_1-MLC\mlc-chat-config.json? When I change the eos_token_id directly to 151645,the error disappears.

你好,我遇到了一样的问题,也在尝试和你一样的模型,但是遇到了一个问题,请问app-config,json里面的model_lib和编译的时候的--conv-template是怎么设置的呢?

我设置:

    {
      "model_url": "",
      "model_lib": "qwen-2_q40f16",
      "estimated_vram_bytes": 4348727787,
      "model_id": "Qwen1.5-1.8B-Chat-q0f16"
    }

    mlc_chat gen_config ./dist/models/$MODEL_NAME/ --quantization $QUANTIZATION \
  --conv-template llama-2 --context-window-size 768 -o dist/${MODEL_NAME}-${QUANTIZATION}-MLC/

会出问题

你好,Qwen用的是chatml模版, --conv-template设置为chatml,不要用llama-2

LumenScopeAI commented 7 months ago

Thank you @sjtu-scx for reporting! The failure is due to we want to make sure that the eos_token_id is an integer in mlc_chat_config.json, but it turns out that in your case the eos_token_id is not. Could you help me check what the value of eos_token_id is in dist\qwen1.5-7b-chat-q4f16_1-MLC\mlc-chat-config.json? When I change the eos_token_id directly to 151645,the error disappears.

你好,我遇到了一样的问题,也在尝试和你一样的模型,但是遇到了一个问题,请问app-config,json里面的model_lib和编译的时候的--conv-template是怎么设置的呢? 我设置:

    {
      "model_url": "",
      "model_lib": "qwen-2_q40f16",
      "estimated_vram_bytes": 4348727787,
      "model_id": "Qwen1.5-1.8B-Chat-q0f16"
    }

    mlc_chat gen_config ./dist/models/$MODEL_NAME/ --quantization $QUANTIZATION \
  --conv-template llama-2 --context-window-size 768 -o dist/${MODEL_NAME}-${QUANTIZATION}-MLC/

会出问题

你好,Qwen用的是chatml模版, --conv-template设置为chatml,不要用llama-2

感谢感谢,请问model_lib的设置怎么做呢?qwen2_q40f16在部署的时候会出错

sjtu-scx commented 7 months ago

Thank you @sjtu-scx for reporting! The failure is due to we want to make sure that the eos_token_id is an integer in mlc_chat_config.json, but it turns out that in your case the eos_token_id is not. Could you help me check what the value of eos_token_id is in dist\qwen1.5-7b-chat-q4f16_1-MLC\mlc-chat-config.json? When I change the eos_token_id directly to 151645,the error disappears.

你好,我遇到了一样的问题,也在尝试和你一样的模型,但是遇到了一个问题,请问app-config,json里面的model_lib和编译的时候的--conv-template是怎么设置的呢? 我设置:

    {
      "model_url": "",
      "model_lib": "qwen-2_q40f16",
      "estimated_vram_bytes": 4348727787,
      "model_id": "Qwen1.5-1.8B-Chat-q0f16"
    }

    mlc_chat gen_config ./dist/models/$MODEL_NAME/ --quantization $QUANTIZATION \
  --conv-template llama-2 --context-window-size 768 -o dist/${MODEL_NAME}-${QUANTIZATION}-MLC/

会出问题

你好,Qwen用的是chatml模版, --conv-template设置为chatml,不要用llama-2

感谢感谢,请问model_lib的设置怎么做呢?qwen2_q40f16在部署的时候会出错

不客气,你是做什么端的部署遇到了问题呀,mlc_chat gen_config 之后下一步就是编译文件到对应的device上了,也是命令行操作的,不需要额外设置

LumenScopeAI commented 7 months ago

Thank you @sjtu-scx for reporting! The failure is due to we want to make sure that the eos_token_id is an integer in mlc_chat_config.json, but it turns out that in your case the eos_token_id is not. Could you help me check what the value of eos_token_id is in dist\qwen1.5-7b-chat-q4f16_1-MLC\mlc-chat-config.json? When I change the eos_token_id directly to 151645,the error disappears.

你好,我遇到了一样的问题,也在尝试和你一样的模型,但是遇到了一个问题,请问app-config,json里面的model_lib和编译的时候的--conv-template是怎么设置的呢? 我设置:

    {
      "model_url": "",
      "model_lib": "qwen-2_q40f16",
      "estimated_vram_bytes": 4348727787,
      "model_id": "Qwen1.5-1.8B-Chat-q0f16"
    }

    mlc_chat gen_config ./dist/models/$MODEL_NAME/ --quantization $QUANTIZATION \
  --conv-template llama-2 --context-window-size 768 -o dist/${MODEL_NAME}-${QUANTIZATION}-MLC/

会出问题

你好,Qwen用的是chatml模版, --conv-template设置为chatml,不要用llama-2

感谢感谢,请问model_lib的设置怎么做呢?qwen2_q40f16在部署的时候会出错

不客气,你是做什么端的部署遇到了问题呀,mlc_chat gen_config 之后下一步就是编译文件到对应的device上了,也是命令行操作的,不需要额外设置

我在尝试安卓和IOS的部署,在生成apk的时候要指定一个model_lib,我遇到了和这里一样的问题https://github.com/mlc-ai/mlc-llm/issues/1517

sjtu-scx commented 7 months ago

Thank you @sjtu-scx for reporting! The failure is due to we want to make sure that the eos_token_id is an integer in mlc_chat_config.json, but it turns out that in your case the eos_token_id is not. Could you help me check what the value of eos_token_id is in dist\qwen1.5-7b-chat-q4f16_1-MLC\mlc-chat-config.json? When I change the eos_token_id directly to 151645,the error disappears.

你好,我遇到了一样的问题,也在尝试和你一样的模型,但是遇到了一个问题,请问app-config,json里面的model_lib和编译的时候的--conv-template是怎么设置的呢? 我设置:

    {
      "model_url": "",
      "model_lib": "qwen-2_q40f16",
      "estimated_vram_bytes": 4348727787,
      "model_id": "Qwen1.5-1.8B-Chat-q0f16"
    }

    mlc_chat gen_config ./dist/models/$MODEL_NAME/ --quantization $QUANTIZATION \
  --conv-template llama-2 --context-window-size 768 -o dist/${MODEL_NAME}-${QUANTIZATION}-MLC/

会出问题

你好,Qwen用的是chatml模版, --conv-template设置为chatml,不要用llama-2

感谢感谢,请问model_lib的设置怎么做呢?qwen2_q40f16在部署的时候会出错

不客气,你是做什么端的部署遇到了问题呀,mlc_chat gen_config 之后下一步就是编译文件到对应的device上了,也是命令行操作的,不需要额外设置

我在尝试安卓和IOS的部署,在生成apk的时候要指定一个model_lib,我遇到了和这里一样的问题#1517

我跟着这个流程做了一下https://github.com/Tao-begd/mlc-llm-android,不知道对你是否有帮助

sjtu-scx commented 7 months ago

Thank you @sjtu-scx for reporting! The failure is due to we want to make sure that the eos_token_id is an integer in mlc_chat_config.json, but it turns out that in your case the eos_token_id is not. Could you help me check what the value of eos_token_id is in dist\qwen1.5-7b-chat-q4f16_1-MLC\mlc-chat-config.json? When I change the eos_token_id directly to 151645,the error disappears.

你好,我遇到了一样的问题,也在尝试和你一样的模型,但是遇到了一个问题,请问app-config,json里面的model_lib和编译的时候的--conv-template是怎么设置的呢? 我设置:

    {
      "model_url": "",
      "model_lib": "qwen-2_q40f16",
      "estimated_vram_bytes": 4348727787,
      "model_id": "Qwen1.5-1.8B-Chat-q0f16"
    }

    mlc_chat gen_config ./dist/models/$MODEL_NAME/ --quantization $QUANTIZATION \
  --conv-template llama-2 --context-window-size 768 -o dist/${MODEL_NAME}-${QUANTIZATION}-MLC/

会出问题

你好,Qwen用的是chatml模版, --conv-template设置为chatml,不要用llama-2

感谢感谢,请问model_lib的设置怎么做呢?qwen2_q40f16在部署的时候会出错

不客气,你是做什么端的部署遇到了问题呀,mlc_chat gen_config 之后下一步就是编译文件到对应的device上了,也是命令行操作的,不需要额外设置

我在尝试安卓和IOS的部署,在生成apk的时候要指定一个model_lib,我遇到了和这里一样的问题#1517

我这边是这样设置的,把用不到的model删掉,然后添加自己的model并设置好路径

{ "model_list": [ { "model_url": "https://huggingface.co/mlc-ai/Llama-2-7b-chat-hf-q4f16_1-MLC/", "model_lib": "llama_q4f16_1", "estimated_vram_bytes": 4348727787, "model_id": "Llama-2-7b-chat-hf-q4f16_1" }

], "model_lib_path_for_prepare_libs": { "llama_q4f16_1": "Llama-2-7b-chat-hf-q4f16_1-MLC\Llama-2-7b-chat-hf-q4f16_1-android.tar" } } 希望对你有帮助~

tqchen commented 7 months ago

@MasterJH5574 maybe a good lessons is we should validate the generated mlc-chat-json for necessary field in gen_config.

MasterJH5574 commented 7 months ago

@sjtu-scx Thanks for sharing the config! Yes right now the ChatModule assumes the eos token id is a single token id, which does not hold for this case. We will work on a fix soon.

MasterJH5574 commented 7 months ago

Fixed here https://github.com/mlc-ai/mlc-llm/pull/1940 by removing the need of eos_token_ids. Please wait for 1-2 days for the Pypi wheel updates.

MrRace commented 7 months ago

Fixed here #1940 by removing the need of eos_token_ids. Please wait for 1-2 days for the Pypi wheel updates.

@MasterJH5574 If I set the eos_token_id directly to a value, say 151645, instead of using the original list, and then recompile the tar file and package the apk file again, upon installing and running the qwen2 model on the phone, the entire system freezes and eventually crashes, requiring a phone restart. Have you encountered this issue before?

MasterJH5574 commented 7 months ago

@MrRace Thanks for the question. Do you mean the result is caused by only changing the eos_token_id?

Maybe we can follow up on this in a new issue. Also cc @Kartik14

MasterJH5574 commented 7 months ago

The original issue should have been resolved. Closing this issue for now.

MrRace commented 7 months ago

@MrRace Thanks for the question. Do you mean the result is caused by only changing the eos_token_id?

Maybe we can follow up on this in a new issue. Also cc @Kartik14

@MasterJH5574 Thanks a lot for your reply. What I mean is, if we simply change the original list value of eos_token_id to a single value, although it won't trigger the previous error: TVMError: Check failed: (config["eos_token_id"].is<int64_t>()) is false:, but when using the input box to input dialogue text, it will cause system crashes and reboots on the mobile phone.