mlc-ai / mlc-llm

Universal LLM Deployment Engine with ML Compilation
https://llm.mlc.ai/
Apache License 2.0
18.82k stars 1.54k forks source link

[Bug] org.apache.tvm.Base$TVMError Check failed: token_id < static_cast<int>(token_table_.size()) #2592

Closed panghongtao closed 3 months ago

panghongtao commented 3 months ago

🐛 Bug

AndroidRuntime: FATAL EXCEPTION: Thread-5 AndroidRuntime: Process: ai.mlc.mlcchat, PID: 24908 AndroidRuntime: org.apache.tvm.Base$TVMError: [11:26:35] E:/project/mlc-llm/cpp/tokenizers/streamer.cc:193: InternalError: Check failed: token_id < static_cast(tokentable.size()) (153685 vs. 151646) : AndroidRuntime: Stack trace not available when DMLC_LOG_STACK_TRACE is disabled at compile time. AndroidRuntime: AndroidRuntime: at org.apache.tvm.Base.checkCall(Base.java:173) AndroidRuntime: at org.apache.tvm.Function.invoke(Function.java:130) AndroidRuntime: at ai.mlc.mlcllm.JSONFFIEngine.runBackgroundLoop(JSONFFIEngine.java:64) AndroidRuntime: at ai.mlc.mlcllm.MLCEngine$backgroundWorker$1.invoke(MLCEngine.kt:42) AndroidRuntime: at ai.mlc.mlcllm.MLCEngine$backgroundWorker$1.invoke(MLCEngine.kt:40) AndroidRuntime: at ai.mlc.mlcllm.BackgroundWorker$start$1.invoke(MLCEngine.kt:19) AndroidRuntime: at ai.mlc.mlcllm.BackgroundWorker$start$1.invoke(MLCEngine.kt:18) AndroidRuntime: at kotlin.concurrent.ThreadsKt$thread$thread$1.run(Thread.kt:30)

To Reproduce

I copied the converted model file directly to the files folder in the Android mlc app. When I type "Who am I?" I can see that there are some replies, but when the reply is not complete, it will flash back and close the APP. The monitoring APP has the above error. How to troubleshoot this error? What do I need to do?

tqchen commented 3 months ago

Seems that the model have large token table but the tokenizer files you copied did not contain as many tokens

panghongtao commented 3 months ago

Seems that the model have large token table but the tokenizer files you copied did not contain as many tokens

So this is a conversion error? Or was the original model faulty? How can I deal with or fix this problem?

tqchen commented 3 months ago

Are you using a new fine tuned model or an original model? I think this could happen when you use a new fine tune with larger token table, but the tokenizer file was the original one. For example, this could happen if you only copied the weight parts but not the related tokenizer files

If you have an example model that can reproduce the error, it can help dig further.

Likely you don't need to run in Android to reproduce the problem. Running the same model on python api would also trigger

panghongtao commented 3 months ago

Yes, this is not an original model. This is an adjusted model. I noticed the conversion of the configuration file.I compare the configuration file that contains the token name.

my model files
tokenizer.json
added_tokens.json
tokenizer_config.json
vocab.json
special_tokens_map.json
model.safetensors.index.json
generation_config.json config.json special_tokens_map.json

mlc files tokenizer.json added_tokens.json tokenizer_config.json vocab.json mlc-chat-config.json ndarray-cache.json

tokenizer.json, added_tokens.json, tokenizer_config.json, vocab.json, The four configuration files are the same before and after conversion and do not change.

About what you said about tokenizer.json. This file is consistent before and after conversion. Is that correct? My model converted to mlc. Is the number of profile correct?

tqchen commented 3 months ago

Likely yolu need to make sure the combined tokenizer file tokenizer.json contain the new tokens. They won'r change during conversion.

You need to make sure the new tokenizer.json get copied into your app, since it seems likely that the tokenizer.json file still contains 151646 tokens (old tokenizer file?), but your new model configuration have more vocab than that

panghongtao commented 3 months ago

Likely yolu need to make sure the combined tokenizer file tokenizer.json contain the new tokens. They won'r change during conversion.

You need to make sure the new tokenizer.json get copied into your app, since it seems likely that the tokenizer.json file still contains 151646 tokens (old tokenizer file?), but your new model configuration have more vocab than that

The token of this model is increased. However, the added tokens are recorded in other config files. How can I solve this problem? Is there any way to fix this error? Now my model. The generated APK file is not available on Anroid. This error occurs and the app will exit.

tqchen commented 3 months ago

might be useful for u to run on python first, i still think it is likely the config file messed up

panghongtao commented 3 months ago

might be useful for u to run on python first, i still think it is likely the config file messed up

Well. If we later discover the cause of this problem. I'll tell you. This question can be turned off

panghongtao commented 3 months ago

might be useful for u to run on python first, i still think it is likely the config file messed up

[2024-06-27 16:35:35] INFO auto_config.py:116: Found model configuration: E:\mlc-model\Qwen-1.5-jiamu-20240627\config.json [2024-06-27 16:35:35] INFO auto_config.py:154: Found model type: qwen2. Use --model-type to override. [2024-06-27 16:35:35] INFO qwen2_model.py:49: context_window_size not found in config.json. Falling back to max_position_embeddings (32768) [2024-06-27 16:35:35] INFO qwen2_model.py:66: prefill_chunk_size defaults to 2048 [2024-06-27 16:35:35] INFO config.py:107: Overriding max_batch_size from 1 to 80 [2024-06-27 16:35:35] INFO gen_config.py:143: [generation_config.json] Setting bos_token_id: 151643 [2024-06-27 16:35:35] INFO gen_config.py:143: [generation_config.json] Setting eos_token_id: [151645, 151643] [2024-06-27 16:35:35] INFO gen_config.py:143: [generation_config.json] Setting pad_token_id: 151643 [2024-06-27 16:35:35] INFO gen_config.py:143: [generation_config.json] Setting repetition_penalty: 1.1 [2024-06-27 16:35:35] INFO gen_config.py:143: [generation_config.json] Setting top_p: 0.8 [2024-06-27 16:35:35] INFO gen_config.py:157: Not found tokenizer config: E:\mlc-model\Qwen-1.5-jiamu-20240627\tokenizer.model [2024-06-27 16:35:35] INFO gen_config.py:155: Found tokenizer config: E:\mlc-model\Qwen-1.5-jiamu-20240627\tokenizer.json. Copying to dist\Qwen-1.5-jiamu-20240627\tokenizer.json [2024-06-27 16:35:35] INFO gen_config.py:155: Found tokenizer config: E:\mlc-model\Qwen-1.5-jiamu-20240627\vocab.json. Copying to dist\Qwen-1.5-jiamu-20240627\vocab.json [2024-06-27 16:35:35] INFO gen_config.py:155: Found tokenizer config: E:\mlc-model\Qwen-1.5-jiamu-20240627\merges.txt. Copying to dist\Qwen-1.5-jiamu-20240627\merges.txt [2024-06-27 16:35:35] INFO gen_config.py:155: Found tokenizer config: E:\mlc-model\Qwen-1.5-jiamu-20240627\added_tokens.json. Copying to dist\Qwen-1.5-jiamu-20240627\added_tokens.json [2024-06-27 16:35:35] INFO gen_config.py:155: Found tokenizer config: E:\mlc-model\Qwen-1.5-jiamu-20240627\tokenizer_config.json. Copying to dist\Qwen-1.5-jiamu-20240627\tokenizer_config.json [2024-06-27 16:35:35] INFO gen_config.py:216: Detected tokenizer info: {'token_postproc_method': 'byte_level', 'prepend_space_in_encode': False, 'strip_space_in_decode': False} [2024-06-27 16:35:35] INFO gen_config.py:32: [System default] Setting temperature: 1.0 [2024-06-27 16:35:35] INFO gen_config.py:32: [System default] Setting presence_penalty: 0.0 [2024-06-27 16:35:35] INFO gen_config.py:32: [System default] Setting frequency_penalty: 0.0 [2024-06-27 16:35:35] INFO gen_config.py:223: Dumping configuration file to: dist\Qwen-1.5-jiamu-20240627\mlc-chat-config.json

When I was converting the configuration file, a file could not be found. If this file is not found, does it affect the conversion profile? I downloaded the official Qwen model 1.5-1.8Chat without this tokenizer.model file. Does this error have an impact on the model?

panghongtao commented 3 months ago

might be useful for u to run on python first, i still think it is likely the config file messed up

The problem has been solved. We tried to do this without adding a multi-token profile. And the number of tokens is changed. My colleague told me that this error may be related to whether the total number of tokens is divisible by 16. I'm not sure about that though. Thank you for your reply. I think this problem can be closed