[Bug] When initializing MLCEngine, getting AttributeError: 'MLCEngine' object has no attribute '_ffi'

lifelongeeek commented 4 days ago

🐛 Bug

When initializing MLCEngine, I got following error AttributeError: 'MLCEngine' object has no attribute '_ffi' `` Here is the full error log

CLML Target Version:  3
[2024-11-18 22:57:06] INFO auto_device.py:88: Not found device: cuda:0
[2024-11-18 22:57:07] INFO auto_device.py:88: Not found device: rocm:0
[2024-11-18 22:57:08] INFO auto_device.py:79: Found device: metal:0
[2024-11-18 22:57:10] INFO auto_device.py:88: Not found device: vulkan:0
[2024-11-18 22:57:11] INFO auto_device.py:88: Not found device: opencl:0
[2024-11-18 22:57:11] INFO auto_device.py:35: Using device: metal:0
[2024-11-18 22:57:11] INFO jit.py:43: MLC_JIT_POLICY = ON. Can be one of: ON, OFF, REDO, READONLY
[2024-11-18 22:57:11] INFO jit.py:118: Compiling using commands below:
[2024-11-18 22:57:11] INFO jit.py:119: /opt/anaconda3/bin/python -m mlc_llm compile /Users/geonminkim/Nota/mlc-llm/dist/models/qwen-2-vp10-q0f16-MLC --opt 'flashinfer=1;cublas_gemm=1;faster_transformer=0;cudagraph=1;cutlass=1;ipc_allreduce_strategy=NONE' --overrides '' --device metal:0 --output /var/folders/m2/yyb0wzkx0m19b5nwx1npqqth0000gn/T/tmpz99qm20o/lib.dylib
CLML Target Version:  3
[2024-11-18 22:57:12] INFO auto_config.py:70: Found model configuration: /Users/geonminkim/Nota/mlc-llm/dist/models/qwen-2-vp10-q0f16-MLC/mlc-chat-config.json
[2024-11-18 22:57:12] INFO auto_target.py:91: Detecting target device: metal:0
[2024-11-18 22:57:12] INFO auto_target.py:93: Found target: {"thread_warp_size": runtime.BoxInt(32), "max_threads_per_block": runtime.BoxInt(1024), "max_function_args": runtime.BoxInt(31), "max_num_threads": runtime.BoxInt(256), "kind": "metal", "max_shared_memory_per_block": runtime.BoxInt(32768), "tag": "", "keys": ["metal", "gpu"]}
[2024-11-18 22:57:12] INFO auto_target.py:110: Found host LLVM triple: arm64-apple-darwin23.3.0
[2024-11-18 22:57:12] INFO auto_target.py:111: Found host LLVM CPU: apple-m1
[2024-11-18 22:57:12] INFO auto_config.py:154: Found model type: qwen2. Use `--model-type` to override.
Compiling with arguments:
  --config          QWen2Config(hidden_act='silu', hidden_size=3584, intermediate_size=18944, num_attention_heads=28, num_hidden_layers=4, num_key_value_heads=4, rms_norm_eps=1e-06, rope_theta=1000000.0, vocab_size=136488, tie_word_embeddings=False, context_window_size=256, prefill_chunk_size=128, tensor_parallel_shards=1, head_dim=128, dtype='float32', max_batch_size=1, kwargs={})
  --quantization    NoQuantize(name='q0f16', kind='no-quant', model_dtype='float16')
  --model-type      qwen2
  --target          {"thread_warp_size": runtime.BoxInt(32), "host": {"mtriple": "arm64-apple-darwin23.3.0", "tag": "", "kind": "llvm", "mcpu": "apple-m1", "keys": ["arm_cpu", "cpu"]}, "max_threads_per_block": runtime.BoxInt(1024), "max_function_args": runtime.BoxInt(31), "max_num_threads": runtime.BoxInt(256), "kind": "metal", "max_shared_memory_per_block": runtime.BoxInt(32768), "tag": "", "keys": ["metal", "gpu"]}
  --opt             flashinfer=0;cublas_gemm=0;faster_transformer=0;cudagraph=0;cutlass=0;ipc_allreduce_strategy=NONE
  --system-lib-prefix ""
  --output          /var/folders/m2/yyb0wzkx0m19b5nwx1npqqth0000gn/T/tmpz99qm20o/lib.dylib
  --overrides       context_window_size=None;sliding_window_size=None;prefill_chunk_size=None;attention_sink_size=None;max_batch_size=None;tensor_parallel_shards=None;pipeline_parallel_stages=None
[2024-11-18 22:57:12] INFO compile.py:140: Creating model from: QWen2Config(hidden_act='silu', hidden_size=3584, intermediate_size=18944, num_attention_heads=28, num_hidden_layers=4, num_key_value_heads=4, rms_norm_eps=1e-06, rope_theta=1000000.0, vocab_size=136488, tie_word_embeddings=False, context_window_size=256, prefill_chunk_size=128, tensor_parallel_shards=1, head_dim=128, dtype='float32', max_batch_size=1, kwargs={})
[2024-11-18 22:57:12] INFO compile.py:158: Exporting the model to TVM Unity compiler
[2024-11-18 22:57:13] INFO compile.py:164: Running optimizations using TVM Unity
[2024-11-18 22:57:13] INFO compile.py:185: Registering metadata: {'model_type': 'qwen2', 'quantization': 'q0f16', 'context_window_size': 256, 'sliding_window_size': -1, 'attention_sink_size': -1, 'prefill_chunk_size': 128, 'tensor_parallel_shards': 1, 'pipeline_parallel_stages': 1, 'kv_state_kind': 'kv_cache', 'max_batch_size': 1}
[2024-11-18 22:57:13] INFO pipeline.py:54: Running TVM Relax graph-level optimizations
[2024-11-18 22:57:13] INFO pipeline.py:54: Lowering to TVM TIR kernels
[2024-11-18 22:57:14] INFO pipeline.py:54: Running TVM TIR-level optimizations
[2024-11-18 22:57:14] INFO pipeline.py:54: Running TVM Dlight low-level optimizations
[2024-11-18 22:57:22] INFO pipeline.py:54: Lowering to VM bytecode
[2024-11-18 22:57:22] INFO estimate_memory_usage.py:58: [Memory usage] Function `alloc_embedding_tensor`: 0.88 MB
[2024-11-18 22:57:22] INFO estimate_memory_usage.py:58: [Memory usage] Function `batch_decode`: 0.67 MB
[2024-11-18 22:57:22] INFO estimate_memory_usage.py:58: [Memory usage] Function `batch_prefill`: 19.03 MB
[2024-11-18 22:57:22] INFO estimate_memory_usage.py:58: [Memory usage] Function `batch_verify`: 85.14 MB
[2024-11-18 22:57:22] INFO estimate_memory_usage.py:58: [Memory usage] Function `create_tir_paged_kv_cache`: 0.00 MB
[2024-11-18 22:57:22] INFO estimate_memory_usage.py:58: [Memory usage] Function `decode`: 0.67 MB
[2024-11-18 22:57:22] INFO estimate_memory_usage.py:58: [Memory usage] Function `embed`: 0.88 MB
[2024-11-18 22:57:22] INFO estimate_memory_usage.py:58: [Memory usage] Function `prefill`: 19.03 MB
[2024-11-18 22:57:22] INFO estimate_memory_usage.py:58: [Memory usage] Function `softmax_with_temperature`: 0.00 MB
[2024-11-18 22:57:22] INFO pipeline.py:54: Compiling external modules
[2024-11-18 22:57:22] INFO pipeline.py:54: Compilation complete! Exporting to disk
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/opt/anaconda3/lib/python3.11/site-packages/mlc_llm/__main__.py", line 64, in <module>
    main()
  File "/opt/anaconda3/lib/python3.11/site-packages/mlc_llm/__main__.py", line 33, in main
    cli.main(sys.argv[2:])
  File "/opt/anaconda3/lib/python3.11/site-packages/mlc_llm/cli/compile.py", line 129, in main
    compile(
  File "/opt/anaconda3/lib/python3.11/site-packages/mlc_llm/interface/compile.py", line 243, in compile
    _compile(args, model_config)
  File "/opt/anaconda3/lib/python3.11/site-packages/mlc_llm/interface/compile.py", line 188, in _compile
    args.build_func(
  File "/opt/anaconda3/lib/python3.11/site-packages/mlc_llm/support/auto_target.py", line 301, in build
    relax.build(
  File "/opt/anaconda3/lib/python3.11/site-packages/tvm/relax/vm_build.py", line 353, in build
    return _vmlink(
           ^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/tvm/relax/vm_build.py", line 249, in _vmlink
    lib = tvm.build(
          ^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/tvm/driver/build_module.py", line 297, in build
    rt_mod_host = _driver_ffi.tir_to_runtime(annotated_mods, target_host)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "tvm/_ffi/_cython/./packed_func.pxi", line 339, in tvm._ffi._cy3.core.PackedFuncBase.__call__
  File "tvm/_ffi/_cython/./packed_func.pxi", line 270, in tvm._ffi._cy3.core.FuncCall
  File "tvm/_ffi/_cython/./packed_func.pxi", line 259, in tvm._ffi._cy3.core.FuncCall3
  File "tvm/_ffi/_cython/./base.pxi", line 185, in tvm._ffi._cy3.core.CHECK_CALL
  File "/opt/anaconda3/lib/python3.11/site-packages/tvm/_ffi/base.py", line 481, in raise_last_ffi_error
    raise py_err
tvm.error.InternalError: Traceback (most recent call last):
  File "/Users/catalyst/Workspace/mlc-ai-package-self-runner/_work/package/package/tvm/src/tir/transforms/storage_rewrite.cc", line 1494
InternalError: Check failed: (me->coeff == 0 || info.factor() % me->coeff == 0) is false: 
Traceback (most recent call last):
  File "/Users/geonminkim/Nota/mlc-llm/compare_hf_and_mlc_chat_examples.py", line 148, in <module>
    main(args)
  File "/Users/geonminkim/Nota/mlc-llm/compare_hf_and_mlc_chat_examples.py", line 66, in main
    engine = MLCEngine(mlc_path)
             ^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/mlc_llm/serve/engine.py", line 1467, in __init__
    super().__init__(
  File "/opt/anaconda3/lib/python3.11/site-packages/mlc_llm/serve/engine_base.py", line 591, in __init__
    ) = _process_model_args(models, device, engine_config)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/mlc_llm/serve/engine_base.py", line 172, in _process_model_args
    model_args: List[Tuple[str, str]] = [_convert_model_info(model) for model in models]
                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/mlc_llm/serve/engine_base.py", line 172, in <listcomp>
    model_args: List[Tuple[str, str]] = [_convert_model_info(model) for model in models]
                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/mlc_llm/serve/engine_base.py", line 165, in _convert_model_info
    model_lib = jit.jit(
                ^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/mlc_llm/interface/jit.py", line 164, in jit
    _run_jit(
  File "/opt/anaconda3/lib/python3.11/site-packages/mlc_llm/interface/jit.py", line 124, in _run_jit
    raise RuntimeError("Cannot find compilation output, compilation failed")
RuntimeError: Cannot find compilation output, compilation failed
Exception ignored in: <function MLCEngineBase.__del__ at 0x1385af880>
Traceback (most recent call last):
  File "/opt/anaconda3/lib/python3.11/site-packages/mlc_llm/serve/engine_base.py", line 655, in __del__
    self.terminate()
  File "/opt/anaconda3/lib/python3.11/site-packages/mlc_llm/serve/engine_base.py", line 662, in terminate
    self._ffi["exit_background_loop"]()
    ^^^^^^^^^
AttributeError: 'MLCEngine' object has no attribute '_ffi'

To Reproduce

mlc_path = "ed-nt/qwen-2-vp10-q0f16-MLC"
engine = MLCEngine(mlc_path)

Expected behavior

MLCEngine is loaded successfully.

Environment

Platform (e.g. WebGPU/Vulkan/IOS/Android/CUDA): metal
Operating system (e.g. Ubuntu/Windows/MacOS/...): MacOS
Device (e.g. iPhone 12 Pro, PC+RTX 3090, ...): Macbook M1
How you installed MLC-LLM (conda, source): python -m pip install --pre -U -f https://mlc.ai/wheels mlc-llm-nightly-cpu mlc-ai-nightly-cpu
How you installed TVM-Unity (pip, source): python -m pip install --pre -U -f https://mlc.ai/wheels mlc-ai-nightly-cpu
Python version (e.g. 3.10): 3.10
GPU driver version (if applicable): N/A
CUDA/cuDNN version (if applicable): N/A

TVM Unity Hash Tag (python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))", applicable if you compile models):

CLML Target Version:  3
USE_NVTX: OFF
USE_GTEST: AUTO
SUMMARIZE: OFF
TVM_DEBUG_WITH_ABI_CHANGE: OFF
USE_IOS_RPC: OFF
USE_MSC: OFF
USE_ETHOSU: 
CUDA_VERSION: NOT-FOUND
USE_LIBBACKTRACE: AUTO
DLPACK_PATH: 3rdparty/dlpack/include
USE_TENSORRT_CODEGEN: OFF
USE_OPENCL_EXTN_QCOM: NOT-FOUND
USE_TARGET_ONNX: OFF
USE_AOT_EXECUTOR: ON
BUILD_DUMMY_LIBTVM: OFF
USE_CUDNN: OFF
USE_TENSORRT_RUNTIME: OFF
USE_ARM_COMPUTE_LIB_GRAPH_EXECUTOR: OFF
USE_THRUST: OFF
USE_CCACHE: AUTO
USE_ARM_COMPUTE_LIB: OFF
USE_CPP_RTVM: 
USE_OPENCL_GTEST: /path/to/opencl/gtest
TVM_LOG_BEFORE_THROW: OFF
USE_MKL: OFF
USE_PT_TVMDSOOP: OFF
MLIR_VERSION: NOT-FOUND
USE_CLML: OFF
USE_STACKVM_RUNTIME: OFF
USE_GRAPH_EXECUTOR_CUDA_GRAPH: OFF
ROCM_PATH: /opt/rocm
USE_DNNL: OFF
USE_MSCCL: OFF
USE_NNAPI_RUNTIME: OFF
USE_VITIS_AI: OFF
USE_MLIR: OFF
USE_RCCL: OFF
USE_LLVM: llvm-config --link-static
USE_VERILATOR: OFF
USE_TF_TVMDSOOP: OFF
USE_THREADS: ON
USE_MSVC_MT: OFF
BACKTRACE_ON_SEGFAULT: OFF
USE_GRAPH_EXECUTOR: ON
USE_NCCL: OFF
USE_ROCBLAS: OFF
GIT_COMMIT_HASH: 30f97b0df3a0078ac5e6be1e8ad50eadcc2dff43
USE_VULKAN: OFF
USE_RUST_EXT: OFF
USE_CUTLASS: OFF
USE_CPP_RPC: OFF
USE_HEXAGON: OFF
USE_CUSTOM_LOGGING: OFF
USE_UMA: OFF
USE_FALLBACK_STL_MAP: OFF
USE_SORT: ON
USE_RTTI: ON
GIT_COMMIT_TIME: 2024-11-15 11:16:12 -0500
USE_HIPBLAS: OFF
USE_HEXAGON_SDK: /path/to/sdk
USE_BLAS: none
USE_ETHOSN: OFF
USE_LIBTORCH: OFF
USE_RANDOM: ON
USE_CUDA: OFF
USE_COREML: OFF
USE_AMX: OFF
BUILD_STATIC_RUNTIME: OFF
USE_CMSISNN: OFF
USE_KHRONOS_SPIRV: OFF
USE_CLML_GRAPH_EXECUTOR: OFF
USE_TFLITE: OFF
USE_HEXAGON_GTEST: /path/to/hexagon/gtest
PICOJSON_PATH: 3rdparty/picojson
USE_OPENCL_ENABLE_HOST_PTR: OFF
INSTALL_DEV: OFF
USE_PROFILER: ON
USE_NNPACK: OFF
LLVM_VERSION: 15.0.7
USE_MRVL: OFF
USE_OPENCL: OFF
COMPILER_RT_PATH: 3rdparty/compiler-rt
USE_NNAPI_CODEGEN: OFF
RANG_PATH: 3rdparty/rang/include
USE_SPIRV_KHR_INTEGER_DOT_PRODUCT: OFF
USE_OPENMP: OFF
USE_BNNS: OFF
USE_FLASHINFER: 
USE_CUBLAS: OFF
USE_METAL: ON
USE_MICRO_STANDALONE_RUNTIME: OFF
USE_HEXAGON_EXTERNAL_LIBS: OFF
USE_ALTERNATIVE_LINKER: AUTO
USE_BYODT_POSIT: OFF
USE_NVSHMEM: OFF
USE_HEXAGON_RPC: OFF
USE_MICRO: OFF
DMLC_PATH: 3rdparty/dmlc-core/include
INDEX_DEFAULT_I64: ON
USE_RELAY_DEBUG: OFF
USE_RPC: ON
USE_TENSORFLOW_PATH: none
TVM_CLML_VERSION: 
USE_MIOPEN: OFF
USE_ROCM: OFF
USE_PAPI: OFF
USE_CURAND: OFF
TVM_CXX_COMPILER_PATH: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++
HIDE_PRIVATE_SYMBOLS: ON

MasterJH5574 commented 1 day ago

Thank you folks for reporting. We are taking a look.

BoltzmannEntropy commented 1 day ago

Same issue here. Any solution? This does not seem to help: python -m pip install --pre -U -f https://mlc.ai/wheels mlc-llm-cpu==0.17.1 mlc-ai-cpu==0.17.1

MasterJH5574 commented 22 hours ago

Hi @lifelongeeek, as shown in the error message you shared, the real issue in this case is

tvm.error.InternalError: Traceback (most recent call last):
  File "/Users/catalyst/Workspace/mlc-ai-package-self-runner/_work/package/package/tvm/src/tir/transforms/storage_rewrite.cc", line 1494
InternalError: Check failed: (me->coeff == 0 || info.factor() % me->coeff == 0) is false:

Could you please upgrade to the latest nightly package? We fixed it and the issue is likely gone.

MasterJH5574 commented 22 hours ago

@BoltzmannEntropy Could you please try the latest nightly package? You can find the installation instructions at https://llm.mlc.ai/docs/install/mlc_llm.html. Please let us know if the problem persists, thanks.

MasterJH5574 commented 8 hours ago

Closing the issue as it has been fixed. See also #3036

mlc-ai / mlc-llm