Closed dusihuaxin closed 6 months ago
I think this problem can be put aside for the time being. I found that when converting the dll, there are two additional files (exp, lib). The model can be loaded and used normally.
but the error during conversion still exists: ValueError: The block no longer exists in the IRModule Stack trace not available when DMLC_LOG_STACK_TRACE is disabled at compile time.
When I converted for the first time, for convenience, I put the mlc and dll files in the same folder. Because of the tutorial, I thought only the dll files were valid, and I didn't know the functions of the other extra files.
However, I still hope that the author can solve the error cause during win compilation. This time I followed the official documentation and used the qwen1.5-0.5b model in the entire process from mlc conversion to dll generation.
Thank you @dusihuaxin for reporting. We will look into this.
This was due to the prefill_chunk_size setting, reduce it would help the issue
这是由于prefill_chunk_size设置,减少它将有助于解决问题
Tk u for the infor
🐛 Bug
I download model from huggingface which mlc-ai provided mlc-ai/Qwen1.5-MoE-A2.7B-Chat-q4f16_1-MLC. fiirst, i complie the mlc to dll use this command: mlc_llm compile E:\Qwen1.5-MoE-A2.7B-Chat-q4f16_1-MLC/mlc-chat-config.json --device vulkan -o E:\libs\Qwen1.5-MoE-A2.7B-Chat-q4f16_1-vulkan.dll
Although the compilation is completed, the following errors are issued in the compilation process: [18:26:27] D:\a\package\package\tvm\src\tir\ir\stmt.cc:122: InternalError: Check failed: (e.dtype().bits() <= loop_var.dtype().bits()) is false: Loop variable's dtype (int32) is narrower than that of
min
orextent
(int64) Stack trace not available when DMLC_LOG_STACK_TRACE is disabled at compile time.[18:26:27] D:\a\package\package\tvm\src\tir\ir\stmt.cc:122: InternalError: Check failed: (e.dtype().bits() <= loop_var.dtype().bits()) is false: Loop variable's dtype (int32) is narrower than that of
min
orextent
(int64) Stack trace not available when DMLC_LOG_STACK_TRACE is disabled at compile time.[18:26:27] D:\a\package\package\tvm\src\tir\ir\stmt.cc:122: InternalError: Check failed: (e.dtype().bits() <= loop_var.dtype().bits()) is false: Loop variable's dtype (int32) is narrower than that of
min
orextent
(int64) Stack trace not available when DMLC_LOG_STACK_TRACE is disabled at compile time. I cut the following part: [2024-04-11 18:26:27] INFO pipeline.py:50: Lowering to VM bytecode [2024-04-11 18:26:29] INFO estimate_memory_usage.py:57: [Memory usage] Functionalloc_embedding_tensor
: 16.00 MB [2024-04-11 18:26:29] INFO estimate_memory_usage.py:57: [Memory usage] Functionbatch_decode
: 0.64 MB [2024-04-11 18:26:29] INFO estimate_memory_usage.py:57: [Memory usage] Functionbatch_prefill
: 236.58 MB [2024-04-11 18:26:29] INFO estimate_memory_usage.py:57: [Memory usage] Functionbatch_verify
: 2610.00 MB [2024-04-11 18:26:29] INFO estimate_memory_usage.py:57: [Memory usage] Functioncreate_tir_paged_kv_cache
: 0.00 MB [2024-04-11 18:26:29] INFO estimate_memory_usage.py:57: [Memory usage] Functiondecode
: 0.64 MB [2024-04-11 18:26:29] INFO estimate_memory_usage.py:57: [Memory usage] Functionembed
: 16.00 MB [2024-04-11 18:26:29] INFO estimate_memory_usage.py:57: [Memory usage] Functionprefill
: 236.58 MB [2024-04-11 18:26:29] INFO estimate_memory_usage.py:57: [Memory usage] Functionsoftmax_with_temperature
: 0.00 MB [2024-04-11 18:26:29] INFO pipeline.py:50: Compiling external modules [2024-04-11 18:26:29] INFO pipeline.py:50: Compilation complete! Exporting to disk [2024-04-11 18:26:37] INFO model_metadata.py:96: Total memory usage: 3605.82 MB (Parameters: 995.82 MB. KVCache: 0.00 MB. Temporary buffer: 2610.00 MB) [2024-04-11 18:26:37] INFO model_metadata.py:105: To reduce memory usage, tweakprefill_chunk_size
,context_window_size
andsliding_window_size
Then, I use the following code to load two files and tell me that the model cache file does not exist. I found that the mistakes reported in the two places were consistent:DMLC_LOG_STACK_TRACE repeatedly appeared, but because I did not check your source code, I don't know where this error means something wrong.
cm = ChatModule( model="E:\Qwen1.5-MoE-A2.7B-Chat-q4f16_1-MLC", model_lib_path="E:\libs\Qwen1.5-MoE-A2.7B-Chat-q4f16_1-vulkan.dll" ) [18:28:11] D:\a\package\package\tvm\src\runtime\relax_vm\ndarray_cache_support.cc:333: ValueError: Cannot find parameter in cache: model.layers.0.mlp.gate_up_proj.q_weight Stack trace not available when DMLC_LOG_STACK_TRACE is disabled at compile time.