Open jackuh105 opened 1 month ago
Hi @jackuh105 thanks for reporting and share the detailed error message. From the backtrace it looks like the actual error happens during download
File "D:\projects\llm-app\mlc-llm\.env\lib\site-packages\mlc_llm\support\download_cache.py", line 228, in get_or_download_model
model_path = download_and_cache_mlc_weights(model)
File "D:\projects\llm-app\mlc-llm\.env\lib\site-packages\mlc_llm\support\download_cache.py", line 192, in download_and_cache_mlc_weights
futures.append(executor.submit(download_file, file_url, file_dest, file_md5))
File "C:\Users\jackc\.pyenv\pyenv-win\versions\3.10.9\lib\concurrent\futures\process.py", line 732, in submit
self._adjust_process_count()
File "C:\Users\jackc\.pyenv\pyenv-win\versions\3.10.9\lib\concurrent\futures\process.py", line 692, in _adjust_process_count
self._spawn_process()
File "C:\Users\jackc\.pyenv\pyenv-win\versions\3.10.9\lib\concurrent\futures\process.py", line 709, in _spawn_process
p.start()
File "C:\Users\jackc\.pyenv\pyenv-win\versions\3.10.9\lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
File "C:\Users\jackc\.pyenv\pyenv-win\versions\3.10.9\lib\multiprocessing\context.py", line 336, in _Popen
return Popen(process_obj)
File "C:\Users\jackc\.pyenv\pyenv-win\versions\3.10.9\lib\multiprocessing\popen_spawn_win32.py", line 45, in __init__
prep_data = spawn.get_preparation_data(process_obj._name)
File "C:\Users\jackc\.pyenv\pyenv-win\versions\3.10.9\lib\multiprocessing\spawn.py", line 154, in get_preparation_data
_check_not_importing_main()
File "C:\Users\jackc\.pyenv\pyenv-win\versions\3.10.9\lib\multiprocessing\spawn.py", line 134, in _check_not_importing_main
raise RuntimeError('''
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
In this case, could make sure that you have git-lfs
installed and try again? If it still doesn't work, alternatively you can manually clone the model and then run the updated demo.py
below to run the downloaded repo.
git lfs install
git clone https://huggingface.co/mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC
# demo.py
from mlc_llm import MLCEngine
# Create engine
model = "./Llama-3-8B-Instruct-q4f16_1-MLC"
engine = MLCEngine(model)
# Run chat completion in OpenAI API.
for response in engine.chat.completions.create(
messages=[{"role": "user", "content": "What is the meaning of life?"}],
model=model,
stream=True,
):
for choice in response.choices:
print(choice.delta.content, end="", flush=True)
print("\n")
engine.terminate()
We will try to update our error message to avoid the confusion, thanks for bringing this up.
@MasterJH5574 Thank you for your support. I have git-lfs
installed already. By changing the model path to the git cloned one, the script runs longer before the error occurred:
[23:07:53] D:\a\package\package\tvm\src\target\llvm\llvm_instance.cc:226: Error: Using LLVM 19.1.1 with `-mcpu=apple-latest` is not valid in `-mtriple=arm64-apple-macos`, using default `-mcpu=generic`
[23:07:53] D:\a\package\package\tvm\src\target\llvm\llvm_instance.cc:226: Error: Using LLVM 19.1.1 with `-mcpu=apple-latest` is not valid in `-mtriple=arm64-apple-macos`, using default `-mcpu=generic`
[23:07:53] D:\a\package\package\tvm\src\target\llvm\llvm_instance.cc:226: Error: Using LLVM 19.1.1 with `-mcpu=apple-latest` is not valid in `-mtriple=arm64-apple-macos`, using default `-mcpu=generic`
[2024-10-17 23:07:57] INFO auto_device.py:88: Not found device: cuda:0
[2024-10-17 23:07:58] INFO auto_device.py:88: Not found device: rocm:0
[2024-10-17 23:07:59] INFO auto_device.py:88: Not found device: metal:0
[2024-10-17 23:08:03] INFO auto_device.py:79: Found device: vulkan:0
[2024-10-17 23:08:04] INFO auto_device.py:88: Not found device: opencl:0
[2024-10-17 23:08:04] INFO auto_device.py:35: Using device: vulkan:0
[2024-10-17 23:08:05] INFO jit.py:43: MLC_JIT_POLICY = ON. Can be one of: ON, OFF, REDO, READONLY
[2024-10-17 23:08:05] INFO jit.py:118: Compiling using commands below:
[2024-10-17 23:08:05] INFO jit.py:119: 'D:\projects\llm-app\mlc-llm\.env\Scripts\python.exe' -m mlc_llm compile Llama-3-8B-Instruct-q4f16_1-MLC --opt 'flashinfer=1;cublas_gemm=1;faster_transformer=0;cudagraph=1;cutlass=1;ipc_allreduce_strategy=NONE' --overrides '' --device vulkan:0 --output 'C:\Users\jackc\AppData\Local\Temp\tmpxket0nak\lib.dll'
[23:08:05] D:\a\package\package\tvm\src\target\llvm\llvm_instance.cc:226: Error: Using LLVM 19.1.1 with `-mcpu=apple-latest` is not valid in `-mtriple=arm64-apple-macos`, using default `-mcpu=generic`
[23:08:05] D:\a\package\package\tvm\src\target\llvm\llvm_instance.cc:226: Error: Using LLVM 19.1.1 with `-mcpu=apple-latest` is not valid in `-mtriple=arm64-apple-macos`, using default `-mcpu=generic`
[23:08:05] D:\a\package\package\tvm\src\target\llvm\llvm_instance.cc:226: Error: Using LLVM 19.1.1 with `-mcpu=apple-latest` is not valid in `-mtriple=arm64-apple-macos`, using default `-mcpu=generic`
[2024-10-17 23:08:06] INFO auto_config.py:70: Found model configuration: Llama-3-8B-Instruct-q4f16_1-MLC\mlc-chat-config.json
[2024-10-17 23:08:06] INFO auto_target.py:91: Detecting target device: vulkan:0
[2024-10-17 23:08:06] INFO auto_target.py:93: Found target: {"thread_warp_size": runtime.BoxInt(1), "supports_float32": runtime.BoxBool(true), "supports_int16": runtime.BoxBool(true), "max_threads_per_block": runtime.BoxInt(1024), "supports_storage_buffer_storage_class": runtime.BoxBool(true), "supports_int8": runtime.BoxBool(true), "supports_8bit_buffer": runtime.BoxBool(true), "supports_int64": runtime.BoxBool(true), "max_num_threads": runtime.BoxInt(256), "kind": "vulkan", "tag": "", "max_shared_memory_per_block": runtime.BoxInt(49152), "supports_16bit_buffer": runtime.BoxBool(true), "supports_int32": runtime.BoxBool(true), "keys": ["vulkan", "gpu"], "supports_float16": runtime.BoxBool(true)}
[2024-10-17 23:08:06] INFO auto_target.py:110: Found host LLVM triple: x86_64-pc-windows-msvc
[2024-10-17 23:08:06] INFO auto_target.py:111: Found host LLVM CPU: alderlake
[2024-10-17 23:08:06] INFO auto_config.py:154: Found model type: llama. Use `--model-type` to override.
Compiling with arguments:
--config LlamaConfig(hidden_size=4096, intermediate_size=14336, num_attention_heads=32, num_hidden_layers=32, rms_norm_eps=1e-05, vocab_size=128256, tie_word_embeddings=False, position_embedding_base=500000.0, rope_scaling=None, context_window_size=8192, prefill_chunk_size=8192, num_key_value_heads=8, head_dim=128, tensor_parallel_shards=1, pipeline_parallel_stages=1, max_batch_size=128, kwargs={})
--quantization GroupQuantize(name='q4f16_1', kind='group-quant', group_size=32, quantize_dtype='int4', storage_dtype='uint32', model_dtype='float16', linear_weight_layout='NK', quantize_embedding=True, quantize_final_fc=True, num_elem_per_storage=8, num_storage_per_group=4, max_int_value=7, tensor_parallel_shards=0)
--model-type llama
--target {"thread_warp_size": runtime.BoxInt(1), "host": {"mtriple": "x86_64-pc-windows-msvc", "tag": "", "kind": "llvm", "mcpu": "alderlake", "keys": ["cpu"]}, "supports_float32": runtime.BoxBool(true), "supports_int16": runtime.BoxBool(true), "max_threads_per_block": runtime.BoxInt(1024), "supports_storage_buffer_storage_class": runtime.BoxBool(true), "supports_int8": runtime.BoxBool(true), "supports_8bit_buffer": runtime.BoxBool(true), "supports_int64": runtime.BoxBool(true), "max_num_threads": runtime.BoxInt(256), "kind": "vulkan", "tag": "", "max_shared_memory_per_block": runtime.BoxInt(49152), "supports_16bit_buffer": runtime.BoxBool(true), "supports_int32": runtime.BoxBool(true), "keys": ["vulkan", "gpu"], "supports_float16": runtime.BoxBool(true)}
--opt flashinfer=0;cublas_gemm=0;faster_transformer=0;cudagraph=0;cutlass=0;ipc_allreduce_strategy=NONE
--system-lib-prefix ""
--output C:\Users\jackc\AppData\Local\Temp\tmpxket0nak\lib.dll
--overrides context_window_size=None;sliding_window_size=None;prefill_chunk_size=None;attention_sink_size=None;max_batch_size=None;tensor_parallel_shards=None;pipeline_parallel_stages=None
[2024-10-17 23:08:06] INFO compile.py:140: Creating model from: LlamaConfig(hidden_size=4096, intermediate_size=14336, num_attention_heads=32, num_hidden_layers=32, rms_norm_eps=1e-05, vocab_size=128256, tie_word_embeddings=False, position_embedding_base=500000.0, rope_scaling=None, context_window_size=8192, prefill_chunk_size=8192, num_key_value_heads=8, head_dim=128, tensor_parallel_shards=1, pipeline_parallel_stages=1, max_batch_size=128, kwargs={})
[2024-10-17 23:08:06] INFO compile.py:158: Exporting the model to TVM Unity compiler
[2024-10-17 23:08:10] INFO compile.py:164: Running optimizations using TVM Unity
[2024-10-17 23:08:10] INFO compile.py:185: Registering metadata: {'model_type': 'llama', 'quantization': 'q4f16_1', 'context_window_size': 8192, 'sliding_window_size': -1, 'attention_sink_size': -1, 'prefill_chunk_size': 8192, 'tensor_parallel_shards': 1, 'pipeline_parallel_stages': 1, 'kv_state_kind': 'kv_cache', 'max_batch_size': 128}
[2024-10-17 23:08:12] INFO pipeline.py:54: Running TVM Relax graph-level optimizations
[2024-10-17 23:08:17] INFO pipeline.py:54: Lowering to TVM TIR kernels
[2024-10-17 23:08:26] INFO pipeline.py:54: Running TVM TIR-level optimizations
[2024-10-17 23:08:45] INFO pipeline.py:54: Running TVM Dlight low-level optimizations
[2024-10-17 23:08:47] INFO pipeline.py:54: Lowering to VM bytecode
[2024-10-17 23:08:51] INFO estimate_memory_usage.py:58: [Memory usage] Function `alloc_embedding_tensor`: 64.00 MB
[2024-10-17 23:08:51] INFO estimate_memory_usage.py:58: [Memory usage] Function `argsort_probs`: 0.00 MB
[2024-10-17 23:08:51] INFO estimate_memory_usage.py:58: [Memory usage] Function `batch_decode`: 18.50 MB
[2024-10-17 23:08:51] INFO estimate_memory_usage.py:58: [Memory usage] Function `batch_decode_to_last_hidden_states`: 19.50 MB
[2024-10-17 23:08:51] INFO estimate_memory_usage.py:58: [Memory usage] Function `batch_prefill`: 1185.00 MB
[2024-10-17 23:08:51] INFO estimate_memory_usage.py:58: [Memory usage] Function `batch_prefill_to_last_hidden_states`: 1248.00 MB
[2024-10-17 23:08:51] INFO estimate_memory_usage.py:58: [Memory usage] Function `batch_select_last_hidden_states`: 1.00 MB
[2024-10-17 23:08:51] INFO estimate_memory_usage.py:58: [Memory usage] Function `batch_verify`: 1184.00 MB
[2024-10-17 23:08:51] INFO estimate_memory_usage.py:58: [Memory usage] Function `batch_verify_to_last_hidden_states`: 1248.00 MB
[2024-10-17 23:08:51] INFO estimate_memory_usage.py:58: [Memory usage] Function `create_tir_paged_kv_cache`: 0.00 MB
[2024-10-17 23:08:51] INFO estimate_memory_usage.py:58: [Memory usage] Function `decode`: 0.14 MB
[2024-10-17 23:08:51] INFO estimate_memory_usage.py:58: [Memory usage] Function `decode_to_last_hidden_states`: 0.15 MB
[2024-10-17 23:08:51] INFO estimate_memory_usage.py:58: [Memory usage] Function `embed`: 64.00 MB
[2024-10-17 23:08:51] INFO estimate_memory_usage.py:58: [Memory usage] Function `gather_hidden_states`: 0.00 MB
[2024-10-17 23:08:51] INFO estimate_memory_usage.py:58: [Memory usage] Function `get_logits`: 0.00 MB
[2024-10-17 23:08:51] INFO estimate_memory_usage.py:58: [Memory usage] Function `multinomial_from_uniform`: 0.00 MB
[2024-10-17 23:08:51] INFO estimate_memory_usage.py:58: [Memory usage] Function `prefill`: 1184.01 MB
[2024-10-17 23:08:51] INFO estimate_memory_usage.py:58: [Memory usage] Function `prefill_to_last_hidden_states`: 1248.00 MB
[2024-10-17 23:08:51] INFO estimate_memory_usage.py:58: [Memory usage] Function `renormalize_by_top_p`: 0.00 MB
[2024-10-17 23:08:51] INFO estimate_memory_usage.py:58: [Memory usage] Function `sample_with_top_p`: 0.00 MB
[2024-10-17 23:08:51] INFO estimate_memory_usage.py:58: [Memory usage] Function `sampler_take_probs`: 0.01 MB
[2024-10-17 23:08:51] INFO estimate_memory_usage.py:58: [Memory usage] Function `sampler_verify_draft_tokens`: 0.00 MB
[2024-10-17 23:08:51] INFO estimate_memory_usage.py:58: [Memory usage] Function `scatter_hidden_states`: 0.00 MB
[2024-10-17 23:08:51] INFO estimate_memory_usage.py:58: [Memory usage] Function `softmax_with_temperature`: 0.00 MB
[2024-10-17 23:08:53] INFO pipeline.py:54: Compiling external modules
[2024-10-17 23:08:53] INFO pipeline.py:54: Compilation complete! Exporting to disk
Traceback (most recent call last):
File "C:\Users\jackc\.pyenv\pyenv-win\versions\3.10.9\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\jackc\.pyenv\pyenv-win\versions\3.10.9\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "D:\projects\llm-app\mlc-llm\.env\lib\site-packages\mlc_llm\__main__.py", line 64, in <module>
main()
File "D:\projects\llm-app\mlc-llm\.env\lib\site-packages\mlc_llm\__main__.py", line 33, in main
cli.main(sys.argv[2:])
File "D:\projects\llm-app\mlc-llm\.env\lib\site-packages\mlc_llm\cli\compile.py", line 129, in main
compile(
File "D:\projects\llm-app\mlc-llm\.env\lib\site-packages\mlc_llm\interface\compile.py", line 243, in compile
_compile(args, model_config)
File "D:\projects\llm-app\mlc-llm\.env\lib\site-packages\mlc_llm\interface\compile.py", line 188, in _compile
args.build_func(
File "D:\projects\llm-app\mlc-llm\.env\lib\site-packages\mlc_llm\support\auto_target.py", line 316, in build
).export_library(
File "D:\projects\llm-app\mlc-llm\.env\lib\site-packages\tvm\relax\vm_build.py", line 146, in export_library
return self.mod.export_library(
File "D:\projects\llm-app\mlc-llm\.env\lib\site-packages\tvm\runtime\module.py", line 628, in export_library
return fcompile(file_name, files, **kwargs)
File "D:\projects\llm-app\mlc-llm\.env\lib\site-packages\tvm\contrib\cc.py", line 96, in create_shared
_windows_compile(output, objects, options, cwd, ccache_env)
File "D:\projects\llm-app\mlc-llm\.env\lib\site-packages\tvm\contrib\cc.py", line 418, in _windows_compile
raise RuntimeError(msg)
RuntimeError: Compilation error:
clang -O2 --target=x86_64 -shared -o C:\Users\jackc\AppData\Local\Temp\tmpxket0nak\lib.dll C:\Users\jackc\AppData\Local\Temp\tmphif53jgb\lib0.o C:\Users\jackc\AppData\Local\Temp\tmphif53jgb\devc.o
C:\Users\jackc\AppData\Local\Temp\tmphif53jgb\lib0.o: file not recognized: File format not recognized
collect2.exe: error: ld returned 1 exit status
clang: error: linker (via gcc) command failed with exit code 1 (use -v to see invocation)
Traceback (most recent call last):
File "D:\projects\llm-app\mlc-llm\demo2.py", line 5, in <module>
engine = MLCEngine(model)
File "D:\projects\llm-app\mlc-llm\.env\lib\site-packages\mlc_llm\serve\engine.py", line 1467, in __init__
super().__init__(
File "D:\projects\llm-app\mlc-llm\.env\lib\site-packages\mlc_llm\serve\engine_base.py", line 590, in __init__
) = _process_model_args(models, device, engine_config)
File "D:\projects\llm-app\mlc-llm\.env\lib\site-packages\mlc_llm\serve\engine_base.py", line 171, in _process_model_args
model_args: List[Tuple[str, str]] = [_convert_model_info(model) for model in models]
File "D:\projects\llm-app\mlc-llm\.env\lib\site-packages\mlc_llm\serve\engine_base.py", line 171, in <listcomp>
model_args: List[Tuple[str, str]] = [_convert_model_info(model) for model in models]
File "D:\projects\llm-app\mlc-llm\.env\lib\site-packages\mlc_llm\serve\engine_base.py", line 164, in _convert_model_info
model_lib = jit.jit(
File "D:\projects\llm-app\mlc-llm\.env\lib\site-packages\mlc_llm\interface\jit.py", line 164, in jit
_run_jit(
File "D:\projects\llm-app\mlc-llm\.env\lib\site-packages\mlc_llm\interface\jit.py", line 124, in _run_jit
raise RuntimeError("Cannot find compilation output, compilation failed")
RuntimeError: Cannot find compilation output, compilation failed
Exception ignored in: <function MLCEngineBase.__del__ at 0x000002034B8EACB0>
Traceback (most recent call last):
File "D:\projects\llm-app\mlc-llm\.env\lib\site-packages\mlc_llm\serve\engine_base.py", line 654, in __del__
self.terminate()
File "D:\projects\llm-app\mlc-llm\.env\lib\site-packages\mlc_llm\serve\engine_base.py", line 661, in terminate
self._ffi["exit_background_loop"]()
AttributeError: 'MLCEngine' object has no attribute '_ffi'
The error is the same, but more info is showed in the trace. As a relevant info, just after running the modifed script, RuntimeError occurred for not able to locate LLVM clang for Windows. So I compiled one and run again, and the result is the error showed in the above trace.
The error might cause by lack of gcc and relative packages. After I tried "conda install gcc", it worked on my device(vulkan).
May be the wrong model path:try absolute model path .
🐛 Bug
To Reproduce
Steps to reproduce the behavior:
Error Message
Expected behavior
Environment
conda
, source): pippip
, source): pippython -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))"
, applicable if you compile models): I haven't compile any model yet, but here's the result:If you need any other info, please tell me, I'll try to collect them. Thank you.