[Bug] Running Quick Start Example in Windows gives Error: `'MLCEngine' object has no attribute '_ffi'`

jackuh105 commented 1 month ago

🐛 Bug

To Reproduce

Steps to reproduce the behavior:

Install prebuilt Python package for windows as the guideline in virtual env.

.\.env\Scripts\activate
python -m pip install --pre -U -f https://mlc.ai/wheels mlc-llm-nightly-cpu mlc-ai-nightly-cpu

Run the Python demo script in Quick start

python demo.py

Error Message

[23:28:02] D:\a\package\package\tvm\src\target\llvm\llvm_instance.cc:226: Error: Using LLVM 19.1.1 with `-mcpu=apple-latest` is not valid in `-mtriple=arm64-apple-macos`, using default `-mcpu=generic`
[23:28:02] D:\a\package\package\tvm\src\target\llvm\llvm_instance.cc:226: Error: Using LLVM 19.1.1 with `-mcpu=apple-latest` is not valid in `-mtriple=arm64-apple-macos`, using default `-mcpu=generic`
[23:28:02] D:\a\package\package\tvm\src\target\llvm\llvm_instance.cc:226: Error: Using LLVM 19.1.1 with `-mcpu=apple-latest` is not valid in `-mtriple=arm64-apple-macos`, using default `-mcpu=generic`
[2024-10-15 23:28:04] INFO auto_device.py:88: Not found device: cuda:0
[2024-10-15 23:28:05] INFO auto_device.py:88: Not found device: rocm:0
[2024-10-15 23:28:06] INFO auto_device.py:88: Not found device: metal:0
[2024-10-15 23:28:08] INFO auto_device.py:79: Found device: vulkan:0
[2024-10-15 23:28:09] INFO auto_device.py:88: Not found device: opencl:0
[2024-10-15 23:28:09] INFO auto_device.py:35: Using device: vulkan:0
[2024-10-15 23:28:09] INFO download_cache.py:227: Downloading model from HuggingFace: HF://mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC
[2024-10-15 23:28:09] INFO download_cache.py:29: MLC_DOWNLOAD_CACHE_POLICY = ON. Can be one of: ON, OFF, REDO, READONLY
[2024-10-15 23:28:09] INFO download_cache.py:56: [Git] Cloning https://huggingface.co/mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC.git to C:\Users\jackc\AppData\Local\Temp\tmpnrxoo4d9\tmp
[2024-10-15 23:28:12] INFO download_cache.py:92: [Git LFS] Downloading 0 files with Git LFS: []
0it [00:00, ?it/s]
  0%|                                                                                                                                                                                                                                                                          | 0/108 [00:00<?, ?it/s][23:28:13] D:\a\package\package\tvm\src\target\llvm\llvm_instance.cc:226: Error: Using LLVM 19.1.1 with `-mcpu=apple-latest` is not valid in `-mtriple=arm64-apple-macos`, using default `-mcpu=generic`[23:28:13] D:\a\package\package\tvm\src\target\llvm\llvm_instance.cc:226: Error: Using LLVM 19.1.1 with `-mcpu=apple-latest` is not valid in `-mtriple=arm64-apple-macos`, using default `-mcpu=generic`

[23:28:13] D:\a\package\package\tvm\src\target\llvm\llvm_instance.cc:226: Error: Using LLVM 19.1.1 with `-mcpu=apple-latest` is not valid in `-mtriple=arm64-apple-macos`, using default `-mcpu=generic`[23:28:13] D:\a\package\package\tvm\src\target\llvm\llvm_instance.cc:226: Error: Using LLVM 19.1.1 with `-mcpu=apple-latest` is not valid in `-mtriple=arm64-apple-macos`, using default `-mcpu=generic`

[23:28:13] D:\a\package\package\tvm\src\target\llvm\llvm_instance.cc:226: Error: Using LLVM 19.1.1 with `-mcpu=apple-latest` is not valid in `-mtriple=arm64-apple-macos`, using default `-mcpu=generic`
[23:28:13] D:\a\package\package\tvm\src\target\llvm\llvm_instance.cc:226: Error: Using LLVM 19.1.1 with `-mcpu=apple-latest` is not valid in `-mtriple=arm64-apple-macos`, using default `-mcpu=generic`
[23:28:13] D:\a\package\package\tvm\src\target\llvm\llvm_instance.cc:226: Error: Using LLVM 19.1.1 with `-mcpu=apple-latest` is not valid in `-mtriple=arm64-apple-macos`, using default `-mcpu=generic`
[23:28:13] D:\a\package\package\tvm\src\target\llvm\llvm_instance.cc:226: Error: Using LLVM 19.1.1 with `-mcpu=apple-latest` is not valid in `-mtriple=arm64-apple-macos`, using default `-mcpu=generic`
[23:28:13] D:\a\package\package\tvm\src\target\llvm\llvm_instance.cc:226: Error: Using LLVM 19.1.1 with `-mcpu=apple-latest` is not valid in `-mtriple=arm64-apple-macos`, using default `-mcpu=generic`
[23:28:13] D:\a\package\package\tvm\src\target\llvm\llvm_instance.cc:226: Error: Using LLVM 19.1.1 with `-mcpu=apple-latest` is not valid in `-mtriple=arm64-apple-macos`, using default `-mcpu=generic`
[23:28:13] D:\a\package\package\tvm\src\target\llvm\llvm_instance.cc:226: Error: Using LLVM 19.1.1 with `-mcpu=apple-latest` is not valid in `-mtriple=arm64-apple-macos`, using default `-mcpu=generic`
[23:28:13] D:\a\package\package\tvm\src\target\llvm\llvm_instance.cc:226: Error: Using LLVM 19.1.1 with `-mcpu=apple-latest` is not valid in `-mtriple=arm64-apple-macos`, using default `-mcpu=generic`
[2024-10-15 23:28:15] INFO auto_device.py:88: Not found device: cuda:0
[2024-10-15 23:28:15] INFO auto_device.py:88: Not found device: cuda:0
[2024-10-15 23:28:15] INFO auto_device.py:88: Not found device: cuda:0
[2024-10-15 23:28:15] INFO auto_device.py:88: Not found device: cuda:0
[2024-10-15 23:28:16] INFO auto_device.py:88: Not found device: rocm:0
[2024-10-15 23:28:16] INFO auto_device.py:88: Not found device: rocm:0
[2024-10-15 23:28:16] INFO auto_device.py:88: Not found device: rocm:0
[2024-10-15 23:28:16] INFO auto_device.py:88: Not found device: rocm:0
[2024-10-15 23:28:18] INFO auto_device.py:88: Not found device: metal:0
[2024-10-15 23:28:18] INFO auto_device.py:88: Not found device: metal:0
[2024-10-15 23:28:18] INFO auto_device.py:88: Not found device: metal:0
[2024-10-15 23:28:18] INFO auto_device.py:88: Not found device: metal:0
[2024-10-15 23:28:20] INFO auto_device.py:79: Found device: vulkan:0
[2024-10-15 23:28:20] INFO auto_device.py:79: Found device: vulkan:0
[2024-10-15 23:28:20] INFO auto_device.py:79: Found device: vulkan:0
[2024-10-15 23:28:20] INFO auto_device.py:79: Found device: vulkan:0
[2024-10-15 23:28:21] INFO auto_device.py:88: Not found device: opencl:0
[2024-10-15 23:28:21] INFO auto_device.py:35: Using device: vulkan:0
[2024-10-15 23:28:21] INFO download_cache.py:227: Downloading model from HuggingFace: HF://mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC
[2024-10-15 23:28:21] INFO download_cache.py:29: MLC_DOWNLOAD_CACHE_POLICY = ON. Can be one of: ON, OFF, REDO, READONLY
[2024-10-15 23:28:21] INFO download_cache.py:56: [Git] Cloning https://huggingface.co/mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC.git to C:\Users\jackc\AppData\Local\Temp\tmp7wzwaez1\tmp
[2024-10-15 23:28:21] INFO auto_device.py:88: Not found device: opencl:0
[2024-10-15 23:28:21] INFO auto_device.py:35: Using device: vulkan:0
[2024-10-15 23:28:21] INFO download_cache.py:227: Downloading model from HuggingFace: HF://mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC
[2024-10-15 23:28:21] INFO download_cache.py:29: MLC_DOWNLOAD_CACHE_POLICY = ON. Can be one of: ON, OFF, REDO, READONLY
[2024-10-15 23:28:21] INFO download_cache.py:56: [Git] Cloning https://huggingface.co/mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC.git to C:\Users\jackc\AppData\Local\Temp\tmp4ev6xp9e\tmp
[2024-10-15 23:28:21] INFO auto_device.py:88: Not found device: opencl:0
[2024-10-15 23:28:21] INFO auto_device.py:35: Using device: vulkan:0
[2024-10-15 23:28:21] INFO download_cache.py:227: Downloading model from HuggingFace: HF://mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC
[2024-10-15 23:28:21] INFO download_cache.py:29: MLC_DOWNLOAD_CACHE_POLICY = ON. Can be one of: ON, OFF, REDO, READONLY
[2024-10-15 23:28:21] INFO download_cache.py:56: [Git] Cloning https://huggingface.co/mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC.git to C:\Users\jackc\AppData\Local\Temp\tmploog26zt\tmp
[2024-10-15 23:28:21] INFO auto_device.py:88: Not found device: opencl:0
[2024-10-15 23:28:21] INFO auto_device.py:35: Using device: vulkan:0
[2024-10-15 23:28:21] INFO download_cache.py:227: Downloading model from HuggingFace: HF://mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC
[2024-10-15 23:28:21] INFO download_cache.py:29: MLC_DOWNLOAD_CACHE_POLICY = ON. Can be one of: ON, OFF, REDO, READONLY
[2024-10-15 23:28:21] INFO download_cache.py:56: [Git] Cloning https://huggingface.co/mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC.git to C:\Users\jackc\AppData\Local\Temp\tmpqommzhgw\tmp
[2024-10-15 23:28:24] INFO download_cache.py:92: [Git LFS] Downloading 0 files with Git LFS: []
0it [00:00, ?it/s]
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\jackc\.pyenv\pyenv-win\versions\3.10.9\lib\multiprocessing\spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "C:\Users\jackc\.pyenv\pyenv-win\versions\3.10.9\lib\multiprocessing\spawn.py", line 125, in _main
    prepare(preparation_data)
  File "C:\Users\jackc\.pyenv\pyenv-win\versions\3.10.9\lib\multiprocessing\spawn.py", line 236, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "C:\Users\jackc\.pyenv\pyenv-win\versions\3.10.9\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
  File "C:\Users\jackc\.pyenv\pyenv-win\versions\3.10.9\lib\runpy.py", line 289, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "C:\Users\jackc\.pyenv\pyenv-win\versions\3.10.9\lib\runpy.py", line 96, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "C:\Users\jackc\.pyenv\pyenv-win\versions\3.10.9\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "D:\projects\llm-app\mlc-llm\demo2.py", line 5, in <module>
    engine = MLCEngine(model)
  File "D:\projects\llm-app\mlc-llm\.env\lib\site-packages\mlc_llm\serve\engine.py", line 1467, in __init__
    super().__init__(
  File "D:\projects\llm-app\mlc-llm\.env\lib\site-packages\mlc_llm\serve\engine_base.py", line 590, in __init__
    ) = _process_model_args(models, device, engine_config)
  File "D:\projects\llm-app\mlc-llm\.env\lib\site-packages\mlc_llm\serve\engine_base.py", line 171, in _process_model_args
    model_args: List[Tuple[str, str]] = [_convert_model_info(model) for model in models]
  File "D:\projects\llm-app\mlc-llm\.env\lib\site-packages\mlc_llm\serve\engine_base.py", line 171, in <listcomp>
    model_args: List[Tuple[str, str]] = [_convert_model_info(model) for model in models]
  File "D:\projects\llm-app\mlc-llm\.env\lib\site-packages\mlc_llm\serve\engine_base.py", line 125, in _convert_model_info
    model_path = download_cache.get_or_download_model(model.model)
  File "D:\projects\llm-app\mlc-llm\.env\lib\site-packages\mlc_llm\support\download_cache.py", line 228, in get_or_download_model
    model_path = download_and_cache_mlc_weights(model)
  File "D:\projects\llm-app\mlc-llm\.env\lib\site-packages\mlc_llm\support\download_cache.py", line 192, in download_and_cache_mlc_weights
    futures.append(executor.submit(download_file, file_url, file_dest, file_md5))
  File "C:\Users\jackc\.pyenv\pyenv-win\versions\3.10.9\lib\concurrent\futures\process.py", line 732, in submit
    self._adjust_process_count()
  File "C:\Users\jackc\.pyenv\pyenv-win\versions\3.10.9\lib\concurrent\futures\process.py", line 692, in _adjust_process_count
    self._spawn_process()
  File "C:\Users\jackc\.pyenv\pyenv-win\versions\3.10.9\lib\concurrent\futures\process.py", line 709, in _spawn_process
    p.start()
  File "C:\Users\jackc\.pyenv\pyenv-win\versions\3.10.9\lib\multiprocessing\process.py", line 121, in start
    self._popen = self._Popen(self)
  File "C:\Users\jackc\.pyenv\pyenv-win\versions\3.10.9\lib\multiprocessing\context.py", line 336, in _Popen
    return Popen(process_obj)
  File "C:\Users\jackc\.pyenv\pyenv-win\versions\3.10.9\lib\multiprocessing\popen_spawn_win32.py", line 45, in __init__
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "C:\Users\jackc\.pyenv\pyenv-win\versions\3.10.9\lib\multiprocessing\spawn.py", line 154, in get_preparation_data
    _check_not_importing_main()
  File "C:\Users\jackc\.pyenv\pyenv-win\versions\3.10.9\lib\multiprocessing\spawn.py", line 134, in _check_not_importing_main
    raise RuntimeError('''
RuntimeError:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.
Exception ignored in: <function MLCEngineBase.__del__ at 0x000001E311977130>
Traceback (most recent call last):
  File "D:\projects\llm-app\mlc-llm\.env\lib\site-packages\mlc_llm\serve\engine_base.py", line 654, in __del__
    self.terminate()
  File "D:\projects\llm-app\mlc-llm\.env\lib\site-packages\mlc_llm\serve\engine_base.py", line 661, in terminate
    self._ffi["exit_background_loop"]()
AttributeError: 'MLCEngine' object has no attribute '_ffi'
  0%|                                                                                                                                                                                                                                                                          | 0/108 [00:11<?, ?it/s]
Traceback (most recent call last):
  File "D:\projects\llm-app\mlc-llm\demo2.py", line 5, in <module>
    engine = MLCEngine(model)
  File "D:\projects\llm-app\mlc-llm\.env\lib\site-packages\mlc_llm\serve\engine.py", line 1467, in __init__
    super().__init__(
  File "D:\projects\llm-app\mlc-llm\.env\lib\site-packages\mlc_llm\serve\engine_base.py", line 590, in __init__
    ) = _process_model_args(models, device, engine_config)
  File "D:\projects\llm-app\mlc-llm\.env\lib\site-packages\mlc_llm\serve\engine_base.py", line 171, in _process_model_args
    model_args: List[Tuple[str, str]] = [_convert_model_info(model) for model in models]
  File "D:\projects\llm-app\mlc-llm\.env\lib\site-packages\mlc_llm\serve\engine_base.py", line 171, in <listcomp>
    model_args: List[Tuple[str, str]] = [_convert_model_info(model) for model in models]
  File "D:\projects\llm-app\mlc-llm\.env\lib\site-packages\mlc_llm\serve\engine_base.py", line 125, in _convert_model_info
    model_path = download_cache.get_or_download_model(model.model)
  File "D:\projects\llm-app\mlc-llm\.env\lib\site-packages\mlc_llm\support\download_cache.py", line 228, in get_or_download_model
    model_path = download_and_cache_mlc_weights(model)
  File "D:\projects\llm-app\mlc-llm\.env\lib\site-packages\mlc_llm\support\download_cache.py", line 195, in download_and_cache_mlc_weights
    file_url, file_dest = future.result()
  File "C:\Users\jackc\.pyenv\pyenv-win\versions\3.10.9\lib\concurrent\futures\_base.py", line 451, in result
    return self.__get_result()
  File "C:\Users\jackc\.pyenv\pyenv-win\versions\3.10.9\lib\concurrent\futures\_base.py", line 403, in __get_result
    raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
Exception ignored in: <function MLCEngineBase.__del__ at 0x000001F76A05ACB0>
Traceback (most recent call last):
  File "D:\projects\llm-app\mlc-llm\.env\lib\site-packages\mlc_llm\serve\engine_base.py", line 654, in __del__
  File "D:\projects\llm-app\mlc-llm\.env\lib\site-packages\mlc_llm\serve\engine_base.py", line 661, in terminate
AttributeError: 'MLCEngine' object has no attribute '_ffi'

Expected behavior

Get a response from Llama 3.

Environment

Platform (e.g. WebGPU/Vulkan/IOS/Android/CUDA): Vulkan
Operating system (e.g. Ubuntu/Windows/MacOS/...): Windows
Device (e.g. iPhone 12 Pro, PC+RTX 3090, ...): PC+RTX3070Ti
How you installed MLC-LLM (conda, source): pip
How you installed TVM-Unity (pip, source): pip
Python version (e.g. 3.10): 3.10.9
GPU driver version (if applicable): 560.94
CUDA/cuDNN version (if applicable): v12.1
TVM Unity Hash Tag (python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))", applicable if you compile models): I haven't compile any model yet, but here's the result:

[23:32:12] D:\a\package\package\tvm\src\target\llvm\llvm_instance.cc:226: Error: Using LLVM 19.1.1 with `-mcpu=apple-latest` is not valid in `-mtriple=arm64-apple-macos`, using default `-mcpu=generic`
[23:32:12] D:\a\package\package\tvm\src\target\llvm\llvm_instance.cc:226: Error: Using LLVM 19.1.1 with `-mcpu=apple-latest` is not valid in `-mtriple=arm64-apple-macos`, using default `-mcpu=generic`
[23:32:12] D:\a\package\package\tvm\src\target\llvm\llvm_instance.cc:226: Error: Using LLVM 19.1.1 with `-mcpu=apple-latest` is not valid in `-mtriple=arm64-apple-macos`, using default `-mcpu=generic`
USE_NVTX: OFF
USE_GTEST: AUTO
SUMMARIZE: OFF
TVM_DEBUG_WITH_ABI_CHANGE: OFF
USE_IOS_RPC: OFF
USE_MSC: OFF
USE_ETHOSU:
CUDA_VERSION: NOT-FOUND
USE_LIBBACKTRACE: AUTO
DLPACK_PATH: 3rdparty/dlpack/include
USE_TENSORRT_CODEGEN: OFF
USE_THRUST: OFF
USE_TARGET_ONNX: OFF
USE_AOT_EXECUTOR: ON
BUILD_DUMMY_LIBTVM: OFF
USE_CUDNN: OFF
USE_TENSORRT_RUNTIME: OFF
USE_ARM_COMPUTE_LIB_GRAPH_EXECUTOR: OFF
USE_CCACHE: AUTO
USE_ARM_COMPUTE_LIB: OFF
USE_CPP_RTVM:
USE_OPENCL_GTEST: /path/to/opencl/gtest
TVM_LOG_BEFORE_THROW: OFF
USE_MKL: OFF
USE_PT_TVMDSOOP: OFF
MLIR_VERSION: NOT-FOUND
USE_CLML: OFF
USE_STACKVM_RUNTIME: OFF
USE_GRAPH_EXECUTOR_CUDA_GRAPH: OFF
ROCM_PATH: /opt/rocm
USE_DNNL: OFF
USE_MSCCL: OFF
USE_NNAPI_RUNTIME: OFF
USE_VITIS_AI: OFF
USE_MLIR: OFF
USE_RCCL: OFF
USE_LLVM: llvm-config --link-static
USE_VERILATOR: OFF
USE_TF_TVMDSOOP: OFF
USE_THREADS: ON
USE_MSVC_MT: OFF
BACKTRACE_ON_SEGFAULT: OFF
USE_GRAPH_EXECUTOR: ON
USE_NCCL: OFF
USE_ROCBLAS: OFF
GIT_COMMIT_HASH: dc87019cb805d0a1f0075f6415cc979ef337ec2a
USE_VULKAN: ON
USE_RUST_EXT: OFF
USE_CUTLASS: OFF
USE_CPP_RPC: OFF
USE_HEXAGON: OFF
USE_CUSTOM_LOGGING: OFF
USE_UMA: OFF
USE_FALLBACK_STL_MAP: OFF
USE_SORT: ON
USE_RTTI: ON
GIT_COMMIT_TIME: 2024-09-28 00:31:12 -0400
USE_HIPBLAS: OFF
USE_HEXAGON_SDK: /path/to/sdk
USE_BLAS: none
USE_ETHOSN: OFF
USE_LIBTORCH: OFF
USE_RANDOM: ON
USE_CUDA: OFF
USE_COREML: OFF
USE_AMX: OFF
BUILD_STATIC_RUNTIME: OFF
USE_CMSISNN: OFF
USE_KHRONOS_SPIRV: OFF
USE_CLML_GRAPH_EXECUTOR: OFF
USE_TFLITE: OFF
USE_HEXAGON_GTEST: /path/to/hexagon/gtest
PICOJSON_PATH: 3rdparty/picojson
USE_OPENCL_ENABLE_HOST_PTR: OFF
INSTALL_DEV: OFF
USE_PROFILER: ON
USE_NNPACK: OFF
LLVM_VERSION: 19.1.1
USE_MRVL: OFF
USE_OPENCL: OFF
COMPILER_RT_PATH: 3rdparty/compiler-rt
USE_NNAPI_CODEGEN: OFF
RANG_PATH: 3rdparty/rang/include
USE_SPIRV_KHR_INTEGER_DOT_PRODUCT: OFF
USE_OPENMP: OFF
USE_BNNS: OFF
USE_FLASHINFER:
USE_CUBLAS: OFF
USE_METAL: OFF
USE_MICRO_STANDALONE_RUNTIME: OFF
USE_HEXAGON_EXTERNAL_LIBS: OFF
USE_ALTERNATIVE_LINKER: AUTO
USE_BYODT_POSIT: OFF
USE_NVSHMEM: OFF
USE_HEXAGON_RPC: OFF
USE_MICRO: OFF
DMLC_PATH: 3rdparty/dmlc-core/include
INDEX_DEFAULT_I64: ON
USE_RELAY_DEBUG: OFF
USE_RPC: ON
USE_TENSORFLOW_PATH: none
TVM_CLML_VERSION:
USE_MIOPEN: OFF
USE_ROCM: OFF
USE_PAPI: OFF
USE_CURAND: OFF
TVM_CXX_COMPILER_PATH: C:/Program Files/Microsoft Visual Studio/2022/Enterprise/VC/Tools/MSVC/14.41.34120/bin/Hostx64/x64/cl.exe
HIDE_PRIVATE_SYMBOLS: OFF

Any other relevant information: Have tried a solution from another issue, but issue still exists:

$ pip3 install --upgrade huggingface_hub
$ huggingface-cli login

If you need any other info, please tell me, I'll try to collect them. Thank you.

MasterJH5574 commented 1 month ago

Hi @jackuh105 thanks for reporting and share the detailed error message. From the backtrace it looks like the actual error happens during download

  File "D:\projects\llm-app\mlc-llm\.env\lib\site-packages\mlc_llm\support\download_cache.py", line 228, in get_or_download_model
    model_path = download_and_cache_mlc_weights(model)
  File "D:\projects\llm-app\mlc-llm\.env\lib\site-packages\mlc_llm\support\download_cache.py", line 192, in download_and_cache_mlc_weights
    futures.append(executor.submit(download_file, file_url, file_dest, file_md5))
  File "C:\Users\jackc\.pyenv\pyenv-win\versions\3.10.9\lib\concurrent\futures\process.py", line 732, in submit
    self._adjust_process_count()
  File "C:\Users\jackc\.pyenv\pyenv-win\versions\3.10.9\lib\concurrent\futures\process.py", line 692, in _adjust_process_count
    self._spawn_process()
  File "C:\Users\jackc\.pyenv\pyenv-win\versions\3.10.9\lib\concurrent\futures\process.py", line 709, in _spawn_process
    p.start()
  File "C:\Users\jackc\.pyenv\pyenv-win\versions\3.10.9\lib\multiprocessing\process.py", line 121, in start
    self._popen = self._Popen(self)
  File "C:\Users\jackc\.pyenv\pyenv-win\versions\3.10.9\lib\multiprocessing\context.py", line 336, in _Popen
    return Popen(process_obj)
  File "C:\Users\jackc\.pyenv\pyenv-win\versions\3.10.9\lib\multiprocessing\popen_spawn_win32.py", line 45, in __init__
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "C:\Users\jackc\.pyenv\pyenv-win\versions\3.10.9\lib\multiprocessing\spawn.py", line 154, in get_preparation_data
    _check_not_importing_main()
  File "C:\Users\jackc\.pyenv\pyenv-win\versions\3.10.9\lib\multiprocessing\spawn.py", line 134, in _check_not_importing_main
    raise RuntimeError('''
RuntimeError:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

In this case, could make sure that you have git-lfs installed and try again? If it still doesn't work, alternatively you can manually clone the model and then run the updated demo.py below to run the downloaded repo.

git lfs install
git clone https://huggingface.co/mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC

# demo.py
from mlc_llm import MLCEngine

# Create engine
model = "./Llama-3-8B-Instruct-q4f16_1-MLC"
engine = MLCEngine(model)

# Run chat completion in OpenAI API.
for response in engine.chat.completions.create(
    messages=[{"role": "user", "content": "What is the meaning of life?"}],
    model=model,
    stream=True,
):
    for choice in response.choices:
        print(choice.delta.content, end="", flush=True)
print("\n")

engine.terminate()

We will try to update our error message to avoid the confusion, thanks for bringing this up.

jackuh105 commented 1 month ago

@MasterJH5574 Thank you for your support. I have git-lfs installed already. By changing the model path to the git cloned one, the script runs longer before the error occurred:

[23:07:53] D:\a\package\package\tvm\src\target\llvm\llvm_instance.cc:226: Error: Using LLVM 19.1.1 with `-mcpu=apple-latest` is not valid in `-mtriple=arm64-apple-macos`, using default `-mcpu=generic`
[23:07:53] D:\a\package\package\tvm\src\target\llvm\llvm_instance.cc:226: Error: Using LLVM 19.1.1 with `-mcpu=apple-latest` is not valid in `-mtriple=arm64-apple-macos`, using default `-mcpu=generic`
[23:07:53] D:\a\package\package\tvm\src\target\llvm\llvm_instance.cc:226: Error: Using LLVM 19.1.1 with `-mcpu=apple-latest` is not valid in `-mtriple=arm64-apple-macos`, using default `-mcpu=generic`
[2024-10-17 23:07:57] INFO auto_device.py:88: Not found device: cuda:0
[2024-10-17 23:07:58] INFO auto_device.py:88: Not found device: rocm:0
[2024-10-17 23:07:59] INFO auto_device.py:88: Not found device: metal:0
[2024-10-17 23:08:03] INFO auto_device.py:79: Found device: vulkan:0
[2024-10-17 23:08:04] INFO auto_device.py:88: Not found device: opencl:0
[2024-10-17 23:08:04] INFO auto_device.py:35: Using device: vulkan:0
[2024-10-17 23:08:05] INFO jit.py:43: MLC_JIT_POLICY = ON. Can be one of: ON, OFF, REDO, READONLY
[2024-10-17 23:08:05] INFO jit.py:118: Compiling using commands below:
[2024-10-17 23:08:05] INFO jit.py:119: 'D:\projects\llm-app\mlc-llm\.env\Scripts\python.exe' -m mlc_llm compile Llama-3-8B-Instruct-q4f16_1-MLC --opt 'flashinfer=1;cublas_gemm=1;faster_transformer=0;cudagraph=1;cutlass=1;ipc_allreduce_strategy=NONE' --overrides '' --device vulkan:0 --output 'C:\Users\jackc\AppData\Local\Temp\tmpxket0nak\lib.dll'
[23:08:05] D:\a\package\package\tvm\src\target\llvm\llvm_instance.cc:226: Error: Using LLVM 19.1.1 with `-mcpu=apple-latest` is not valid in `-mtriple=arm64-apple-macos`, using default `-mcpu=generic`
[23:08:05] D:\a\package\package\tvm\src\target\llvm\llvm_instance.cc:226: Error: Using LLVM 19.1.1 with `-mcpu=apple-latest` is not valid in `-mtriple=arm64-apple-macos`, using default `-mcpu=generic`
[23:08:05] D:\a\package\package\tvm\src\target\llvm\llvm_instance.cc:226: Error: Using LLVM 19.1.1 with `-mcpu=apple-latest` is not valid in `-mtriple=arm64-apple-macos`, using default `-mcpu=generic`
[2024-10-17 23:08:06] INFO auto_config.py:70: Found model configuration: Llama-3-8B-Instruct-q4f16_1-MLC\mlc-chat-config.json
[2024-10-17 23:08:06] INFO auto_target.py:91: Detecting target device: vulkan:0
[2024-10-17 23:08:06] INFO auto_target.py:93: Found target: {"thread_warp_size": runtime.BoxInt(1), "supports_float32": runtime.BoxBool(true), "supports_int16": runtime.BoxBool(true), "max_threads_per_block": runtime.BoxInt(1024), "supports_storage_buffer_storage_class": runtime.BoxBool(true), "supports_int8": runtime.BoxBool(true), "supports_8bit_buffer": runtime.BoxBool(true), "supports_int64": runtime.BoxBool(true), "max_num_threads": runtime.BoxInt(256), "kind": "vulkan", "tag": "", "max_shared_memory_per_block": runtime.BoxInt(49152), "supports_16bit_buffer": runtime.BoxBool(true), "supports_int32": runtime.BoxBool(true), "keys": ["vulkan", "gpu"], "supports_float16": runtime.BoxBool(true)}
[2024-10-17 23:08:06] INFO auto_target.py:110: Found host LLVM triple: x86_64-pc-windows-msvc
[2024-10-17 23:08:06] INFO auto_target.py:111: Found host LLVM CPU: alderlake
[2024-10-17 23:08:06] INFO auto_config.py:154: Found model type: llama. Use `--model-type` to override.
Compiling with arguments:
  --config          LlamaConfig(hidden_size=4096, intermediate_size=14336, num_attention_heads=32, num_hidden_layers=32, rms_norm_eps=1e-05, vocab_size=128256, tie_word_embeddings=False, position_embedding_base=500000.0, rope_scaling=None, context_window_size=8192, prefill_chunk_size=8192, num_key_value_heads=8, head_dim=128, tensor_parallel_shards=1, pipeline_parallel_stages=1, max_batch_size=128, kwargs={})
  --quantization    GroupQuantize(name='q4f16_1', kind='group-quant', group_size=32, quantize_dtype='int4', storage_dtype='uint32', model_dtype='float16', linear_weight_layout='NK', quantize_embedding=True, quantize_final_fc=True, num_elem_per_storage=8, num_storage_per_group=4, max_int_value=7, tensor_parallel_shards=0)
  --model-type      llama
  --target          {"thread_warp_size": runtime.BoxInt(1), "host": {"mtriple": "x86_64-pc-windows-msvc", "tag": "", "kind": "llvm", "mcpu": "alderlake", "keys": ["cpu"]}, "supports_float32": runtime.BoxBool(true), "supports_int16": runtime.BoxBool(true), "max_threads_per_block": runtime.BoxInt(1024), "supports_storage_buffer_storage_class": runtime.BoxBool(true), "supports_int8": runtime.BoxBool(true), "supports_8bit_buffer": runtime.BoxBool(true), "supports_int64": runtime.BoxBool(true), "max_num_threads": runtime.BoxInt(256), "kind": "vulkan", "tag": "", "max_shared_memory_per_block": runtime.BoxInt(49152), "supports_16bit_buffer": runtime.BoxBool(true), "supports_int32": runtime.BoxBool(true), "keys": ["vulkan", "gpu"], "supports_float16": runtime.BoxBool(true)}
  --opt             flashinfer=0;cublas_gemm=0;faster_transformer=0;cudagraph=0;cutlass=0;ipc_allreduce_strategy=NONE
  --system-lib-prefix ""
  --output          C:\Users\jackc\AppData\Local\Temp\tmpxket0nak\lib.dll
  --overrides       context_window_size=None;sliding_window_size=None;prefill_chunk_size=None;attention_sink_size=None;max_batch_size=None;tensor_parallel_shards=None;pipeline_parallel_stages=None
[2024-10-17 23:08:06] INFO compile.py:140: Creating model from: LlamaConfig(hidden_size=4096, intermediate_size=14336, num_attention_heads=32, num_hidden_layers=32, rms_norm_eps=1e-05, vocab_size=128256, tie_word_embeddings=False, position_embedding_base=500000.0, rope_scaling=None, context_window_size=8192, prefill_chunk_size=8192, num_key_value_heads=8, head_dim=128, tensor_parallel_shards=1, pipeline_parallel_stages=1, max_batch_size=128, kwargs={})
[2024-10-17 23:08:06] INFO compile.py:158: Exporting the model to TVM Unity compiler
[2024-10-17 23:08:10] INFO compile.py:164: Running optimizations using TVM Unity
[2024-10-17 23:08:10] INFO compile.py:185: Registering metadata: {'model_type': 'llama', 'quantization': 'q4f16_1', 'context_window_size': 8192, 'sliding_window_size': -1, 'attention_sink_size': -1, 'prefill_chunk_size': 8192, 'tensor_parallel_shards': 1, 'pipeline_parallel_stages': 1, 'kv_state_kind': 'kv_cache', 'max_batch_size': 128}
[2024-10-17 23:08:12] INFO pipeline.py:54: Running TVM Relax graph-level optimizations
[2024-10-17 23:08:17] INFO pipeline.py:54: Lowering to TVM TIR kernels
[2024-10-17 23:08:26] INFO pipeline.py:54: Running TVM TIR-level optimizations
[2024-10-17 23:08:45] INFO pipeline.py:54: Running TVM Dlight low-level optimizations
[2024-10-17 23:08:47] INFO pipeline.py:54: Lowering to VM bytecode
[2024-10-17 23:08:51] INFO estimate_memory_usage.py:58: [Memory usage] Function `alloc_embedding_tensor`: 64.00 MB
[2024-10-17 23:08:51] INFO estimate_memory_usage.py:58: [Memory usage] Function `argsort_probs`: 0.00 MB
[2024-10-17 23:08:51] INFO estimate_memory_usage.py:58: [Memory usage] Function `batch_decode`: 18.50 MB
[2024-10-17 23:08:51] INFO estimate_memory_usage.py:58: [Memory usage] Function `batch_decode_to_last_hidden_states`: 19.50 MB
[2024-10-17 23:08:51] INFO estimate_memory_usage.py:58: [Memory usage] Function `batch_prefill`: 1185.00 MB
[2024-10-17 23:08:51] INFO estimate_memory_usage.py:58: [Memory usage] Function `batch_prefill_to_last_hidden_states`: 1248.00 MB
[2024-10-17 23:08:51] INFO estimate_memory_usage.py:58: [Memory usage] Function `batch_select_last_hidden_states`: 1.00 MB
[2024-10-17 23:08:51] INFO estimate_memory_usage.py:58: [Memory usage] Function `batch_verify`: 1184.00 MB
[2024-10-17 23:08:51] INFO estimate_memory_usage.py:58: [Memory usage] Function `batch_verify_to_last_hidden_states`: 1248.00 MB
[2024-10-17 23:08:51] INFO estimate_memory_usage.py:58: [Memory usage] Function `create_tir_paged_kv_cache`: 0.00 MB
[2024-10-17 23:08:51] INFO estimate_memory_usage.py:58: [Memory usage] Function `decode`: 0.14 MB
[2024-10-17 23:08:51] INFO estimate_memory_usage.py:58: [Memory usage] Function `decode_to_last_hidden_states`: 0.15 MB
[2024-10-17 23:08:51] INFO estimate_memory_usage.py:58: [Memory usage] Function `embed`: 64.00 MB
[2024-10-17 23:08:51] INFO estimate_memory_usage.py:58: [Memory usage] Function `gather_hidden_states`: 0.00 MB
[2024-10-17 23:08:51] INFO estimate_memory_usage.py:58: [Memory usage] Function `get_logits`: 0.00 MB
[2024-10-17 23:08:51] INFO estimate_memory_usage.py:58: [Memory usage] Function `multinomial_from_uniform`: 0.00 MB
[2024-10-17 23:08:51] INFO estimate_memory_usage.py:58: [Memory usage] Function `prefill`: 1184.01 MB
[2024-10-17 23:08:51] INFO estimate_memory_usage.py:58: [Memory usage] Function `prefill_to_last_hidden_states`: 1248.00 MB
[2024-10-17 23:08:51] INFO estimate_memory_usage.py:58: [Memory usage] Function `renormalize_by_top_p`: 0.00 MB
[2024-10-17 23:08:51] INFO estimate_memory_usage.py:58: [Memory usage] Function `sample_with_top_p`: 0.00 MB
[2024-10-17 23:08:51] INFO estimate_memory_usage.py:58: [Memory usage] Function `sampler_take_probs`: 0.01 MB
[2024-10-17 23:08:51] INFO estimate_memory_usage.py:58: [Memory usage] Function `sampler_verify_draft_tokens`: 0.00 MB
[2024-10-17 23:08:51] INFO estimate_memory_usage.py:58: [Memory usage] Function `scatter_hidden_states`: 0.00 MB
[2024-10-17 23:08:51] INFO estimate_memory_usage.py:58: [Memory usage] Function `softmax_with_temperature`: 0.00 MB
[2024-10-17 23:08:53] INFO pipeline.py:54: Compiling external modules
[2024-10-17 23:08:53] INFO pipeline.py:54: Compilation complete! Exporting to disk
Traceback (most recent call last):
  File "C:\Users\jackc\.pyenv\pyenv-win\versions\3.10.9\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\jackc\.pyenv\pyenv-win\versions\3.10.9\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "D:\projects\llm-app\mlc-llm\.env\lib\site-packages\mlc_llm\__main__.py", line 64, in <module>
    main()
  File "D:\projects\llm-app\mlc-llm\.env\lib\site-packages\mlc_llm\__main__.py", line 33, in main
    cli.main(sys.argv[2:])
  File "D:\projects\llm-app\mlc-llm\.env\lib\site-packages\mlc_llm\cli\compile.py", line 129, in main
    compile(
  File "D:\projects\llm-app\mlc-llm\.env\lib\site-packages\mlc_llm\interface\compile.py", line 243, in compile
    _compile(args, model_config)
  File "D:\projects\llm-app\mlc-llm\.env\lib\site-packages\mlc_llm\interface\compile.py", line 188, in _compile
    args.build_func(
  File "D:\projects\llm-app\mlc-llm\.env\lib\site-packages\mlc_llm\support\auto_target.py", line 316, in build
    ).export_library(
  File "D:\projects\llm-app\mlc-llm\.env\lib\site-packages\tvm\relax\vm_build.py", line 146, in export_library
    return self.mod.export_library(
  File "D:\projects\llm-app\mlc-llm\.env\lib\site-packages\tvm\runtime\module.py", line 628, in export_library
    return fcompile(file_name, files, **kwargs)
  File "D:\projects\llm-app\mlc-llm\.env\lib\site-packages\tvm\contrib\cc.py", line 96, in create_shared
    _windows_compile(output, objects, options, cwd, ccache_env)
  File "D:\projects\llm-app\mlc-llm\.env\lib\site-packages\tvm\contrib\cc.py", line 418, in _windows_compile
    raise RuntimeError(msg)
RuntimeError: Compilation error:
clang -O2 --target=x86_64 -shared -o C:\Users\jackc\AppData\Local\Temp\tmpxket0nak\lib.dll C:\Users\jackc\AppData\Local\Temp\tmphif53jgb\lib0.o C:\Users\jackc\AppData\Local\Temp\tmphif53jgb\devc.o
C:\Users\jackc\AppData\Local\Temp\tmphif53jgb\lib0.o: file not recognized: File format not recognized
collect2.exe: error: ld returned 1 exit status
clang: error: linker (via gcc) command failed with exit code 1 (use -v to see invocation)

Traceback (most recent call last):
  File "D:\projects\llm-app\mlc-llm\demo2.py", line 5, in <module>
    engine = MLCEngine(model)
  File "D:\projects\llm-app\mlc-llm\.env\lib\site-packages\mlc_llm\serve\engine.py", line 1467, in __init__
    super().__init__(
  File "D:\projects\llm-app\mlc-llm\.env\lib\site-packages\mlc_llm\serve\engine_base.py", line 590, in __init__
    ) = _process_model_args(models, device, engine_config)
  File "D:\projects\llm-app\mlc-llm\.env\lib\site-packages\mlc_llm\serve\engine_base.py", line 171, in _process_model_args
    model_args: List[Tuple[str, str]] = [_convert_model_info(model) for model in models]
  File "D:\projects\llm-app\mlc-llm\.env\lib\site-packages\mlc_llm\serve\engine_base.py", line 171, in <listcomp>
    model_args: List[Tuple[str, str]] = [_convert_model_info(model) for model in models]
  File "D:\projects\llm-app\mlc-llm\.env\lib\site-packages\mlc_llm\serve\engine_base.py", line 164, in _convert_model_info
    model_lib = jit.jit(
  File "D:\projects\llm-app\mlc-llm\.env\lib\site-packages\mlc_llm\interface\jit.py", line 164, in jit
    _run_jit(
  File "D:\projects\llm-app\mlc-llm\.env\lib\site-packages\mlc_llm\interface\jit.py", line 124, in _run_jit
    raise RuntimeError("Cannot find compilation output, compilation failed")
RuntimeError: Cannot find compilation output, compilation failed
Exception ignored in: <function MLCEngineBase.__del__ at 0x000002034B8EACB0>
Traceback (most recent call last):
  File "D:\projects\llm-app\mlc-llm\.env\lib\site-packages\mlc_llm\serve\engine_base.py", line 654, in __del__
    self.terminate()
  File "D:\projects\llm-app\mlc-llm\.env\lib\site-packages\mlc_llm\serve\engine_base.py", line 661, in terminate
    self._ffi["exit_background_loop"]()
AttributeError: 'MLCEngine' object has no attribute '_ffi'

The error is the same, but more info is showed in the trace. As a relevant info, just after running the modifed script, RuntimeError occurred for not able to locate LLVM clang for Windows. So I compiled one and run again, and the result is the error showed in the above trace.

SkyHeroesS commented 4 weeks ago

The error might cause by lack of gcc and relative packages. After I tried "conda install gcc", it worked on my device(vulkan).

AylerH commented 1 week ago

May be the wrong model path:try absolute model path .

mlc-ai / mlc-llm