mlc-ai / mlc-llm

Universal LLM Deployment Engine with ML Compilation
https://llm.mlc.ai/
Apache License 2.0
19.23k stars 1.58k forks source link

[Bug] When initializing MLCEngine, getting AttributeError: 'MLCEngine' object has no attribute '_ffi' #3035

Closed lifelongeeek closed 8 hours ago

lifelongeeek commented 4 days ago

🐛 Bug

When initializing MLCEngine, I got following error AttributeError: 'MLCEngine' object has no attribute '_ffi' `` Here is the full error log

CLML Target Version:  3
[2024-11-18 22:57:06] INFO auto_device.py:88: Not found device: cuda:0
[2024-11-18 22:57:07] INFO auto_device.py:88: Not found device: rocm:0
[2024-11-18 22:57:08] INFO auto_device.py:79: Found device: metal:0
[2024-11-18 22:57:10] INFO auto_device.py:88: Not found device: vulkan:0
[2024-11-18 22:57:11] INFO auto_device.py:88: Not found device: opencl:0
[2024-11-18 22:57:11] INFO auto_device.py:35: Using device: metal:0
[2024-11-18 22:57:11] INFO jit.py:43: MLC_JIT_POLICY = ON. Can be one of: ON, OFF, REDO, READONLY
[2024-11-18 22:57:11] INFO jit.py:118: Compiling using commands below:
[2024-11-18 22:57:11] INFO jit.py:119: /opt/anaconda3/bin/python -m mlc_llm compile /Users/geonminkim/Nota/mlc-llm/dist/models/qwen-2-vp10-q0f16-MLC --opt 'flashinfer=1;cublas_gemm=1;faster_transformer=0;cudagraph=1;cutlass=1;ipc_allreduce_strategy=NONE' --overrides '' --device metal:0 --output /var/folders/m2/yyb0wzkx0m19b5nwx1npqqth0000gn/T/tmpz99qm20o/lib.dylib
CLML Target Version:  3
[2024-11-18 22:57:12] INFO auto_config.py:70: Found model configuration: /Users/geonminkim/Nota/mlc-llm/dist/models/qwen-2-vp10-q0f16-MLC/mlc-chat-config.json
[2024-11-18 22:57:12] INFO auto_target.py:91: Detecting target device: metal:0
[2024-11-18 22:57:12] INFO auto_target.py:93: Found target: {"thread_warp_size": runtime.BoxInt(32), "max_threads_per_block": runtime.BoxInt(1024), "max_function_args": runtime.BoxInt(31), "max_num_threads": runtime.BoxInt(256), "kind": "metal", "max_shared_memory_per_block": runtime.BoxInt(32768), "tag": "", "keys": ["metal", "gpu"]}
[2024-11-18 22:57:12] INFO auto_target.py:110: Found host LLVM triple: arm64-apple-darwin23.3.0
[2024-11-18 22:57:12] INFO auto_target.py:111: Found host LLVM CPU: apple-m1
[2024-11-18 22:57:12] INFO auto_config.py:154: Found model type: qwen2. Use `--model-type` to override.
Compiling with arguments:
  --config          QWen2Config(hidden_act='silu', hidden_size=3584, intermediate_size=18944, num_attention_heads=28, num_hidden_layers=4, num_key_value_heads=4, rms_norm_eps=1e-06, rope_theta=1000000.0, vocab_size=136488, tie_word_embeddings=False, context_window_size=256, prefill_chunk_size=128, tensor_parallel_shards=1, head_dim=128, dtype='float32', max_batch_size=1, kwargs={})
  --quantization    NoQuantize(name='q0f16', kind='no-quant', model_dtype='float16')
  --model-type      qwen2
  --target          {"thread_warp_size": runtime.BoxInt(32), "host": {"mtriple": "arm64-apple-darwin23.3.0", "tag": "", "kind": "llvm", "mcpu": "apple-m1", "keys": ["arm_cpu", "cpu"]}, "max_threads_per_block": runtime.BoxInt(1024), "max_function_args": runtime.BoxInt(31), "max_num_threads": runtime.BoxInt(256), "kind": "metal", "max_shared_memory_per_block": runtime.BoxInt(32768), "tag": "", "keys": ["metal", "gpu"]}
  --opt             flashinfer=0;cublas_gemm=0;faster_transformer=0;cudagraph=0;cutlass=0;ipc_allreduce_strategy=NONE
  --system-lib-prefix ""
  --output          /var/folders/m2/yyb0wzkx0m19b5nwx1npqqth0000gn/T/tmpz99qm20o/lib.dylib
  --overrides       context_window_size=None;sliding_window_size=None;prefill_chunk_size=None;attention_sink_size=None;max_batch_size=None;tensor_parallel_shards=None;pipeline_parallel_stages=None
[2024-11-18 22:57:12] INFO compile.py:140: Creating model from: QWen2Config(hidden_act='silu', hidden_size=3584, intermediate_size=18944, num_attention_heads=28, num_hidden_layers=4, num_key_value_heads=4, rms_norm_eps=1e-06, rope_theta=1000000.0, vocab_size=136488, tie_word_embeddings=False, context_window_size=256, prefill_chunk_size=128, tensor_parallel_shards=1, head_dim=128, dtype='float32', max_batch_size=1, kwargs={})
[2024-11-18 22:57:12] INFO compile.py:158: Exporting the model to TVM Unity compiler
[2024-11-18 22:57:13] INFO compile.py:164: Running optimizations using TVM Unity
[2024-11-18 22:57:13] INFO compile.py:185: Registering metadata: {'model_type': 'qwen2', 'quantization': 'q0f16', 'context_window_size': 256, 'sliding_window_size': -1, 'attention_sink_size': -1, 'prefill_chunk_size': 128, 'tensor_parallel_shards': 1, 'pipeline_parallel_stages': 1, 'kv_state_kind': 'kv_cache', 'max_batch_size': 1}
[2024-11-18 22:57:13] INFO pipeline.py:54: Running TVM Relax graph-level optimizations
[2024-11-18 22:57:13] INFO pipeline.py:54: Lowering to TVM TIR kernels
[2024-11-18 22:57:14] INFO pipeline.py:54: Running TVM TIR-level optimizations
[2024-11-18 22:57:14] INFO pipeline.py:54: Running TVM Dlight low-level optimizations
[2024-11-18 22:57:22] INFO pipeline.py:54: Lowering to VM bytecode
[2024-11-18 22:57:22] INFO estimate_memory_usage.py:58: [Memory usage] Function `alloc_embedding_tensor`: 0.88 MB
[2024-11-18 22:57:22] INFO estimate_memory_usage.py:58: [Memory usage] Function `batch_decode`: 0.67 MB
[2024-11-18 22:57:22] INFO estimate_memory_usage.py:58: [Memory usage] Function `batch_prefill`: 19.03 MB
[2024-11-18 22:57:22] INFO estimate_memory_usage.py:58: [Memory usage] Function `batch_verify`: 85.14 MB
[2024-11-18 22:57:22] INFO estimate_memory_usage.py:58: [Memory usage] Function `create_tir_paged_kv_cache`: 0.00 MB
[2024-11-18 22:57:22] INFO estimate_memory_usage.py:58: [Memory usage] Function `decode`: 0.67 MB
[2024-11-18 22:57:22] INFO estimate_memory_usage.py:58: [Memory usage] Function `embed`: 0.88 MB
[2024-11-18 22:57:22] INFO estimate_memory_usage.py:58: [Memory usage] Function `prefill`: 19.03 MB
[2024-11-18 22:57:22] INFO estimate_memory_usage.py:58: [Memory usage] Function `softmax_with_temperature`: 0.00 MB
[2024-11-18 22:57:22] INFO pipeline.py:54: Compiling external modules
[2024-11-18 22:57:22] INFO pipeline.py:54: Compilation complete! Exporting to disk
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/opt/anaconda3/lib/python3.11/site-packages/mlc_llm/__main__.py", line 64, in <module>
    main()
  File "/opt/anaconda3/lib/python3.11/site-packages/mlc_llm/__main__.py", line 33, in main
    cli.main(sys.argv[2:])
  File "/opt/anaconda3/lib/python3.11/site-packages/mlc_llm/cli/compile.py", line 129, in main
    compile(
  File "/opt/anaconda3/lib/python3.11/site-packages/mlc_llm/interface/compile.py", line 243, in compile
    _compile(args, model_config)
  File "/opt/anaconda3/lib/python3.11/site-packages/mlc_llm/interface/compile.py", line 188, in _compile
    args.build_func(
  File "/opt/anaconda3/lib/python3.11/site-packages/mlc_llm/support/auto_target.py", line 301, in build
    relax.build(
  File "/opt/anaconda3/lib/python3.11/site-packages/tvm/relax/vm_build.py", line 353, in build
    return _vmlink(
           ^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/tvm/relax/vm_build.py", line 249, in _vmlink
    lib = tvm.build(
          ^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/tvm/driver/build_module.py", line 297, in build
    rt_mod_host = _driver_ffi.tir_to_runtime(annotated_mods, target_host)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "tvm/_ffi/_cython/./packed_func.pxi", line 339, in tvm._ffi._cy3.core.PackedFuncBase.__call__
  File "tvm/_ffi/_cython/./packed_func.pxi", line 270, in tvm._ffi._cy3.core.FuncCall
  File "tvm/_ffi/_cython/./packed_func.pxi", line 259, in tvm._ffi._cy3.core.FuncCall3
  File "tvm/_ffi/_cython/./base.pxi", line 185, in tvm._ffi._cy3.core.CHECK_CALL
  File "/opt/anaconda3/lib/python3.11/site-packages/tvm/_ffi/base.py", line 481, in raise_last_ffi_error
    raise py_err
tvm.error.InternalError: Traceback (most recent call last):
  File "/Users/catalyst/Workspace/mlc-ai-package-self-runner/_work/package/package/tvm/src/tir/transforms/storage_rewrite.cc", line 1494
InternalError: Check failed: (me->coeff == 0 || info.factor() % me->coeff == 0) is false: 
Traceback (most recent call last):
  File "/Users/geonminkim/Nota/mlc-llm/compare_hf_and_mlc_chat_examples.py", line 148, in <module>
    main(args)
  File "/Users/geonminkim/Nota/mlc-llm/compare_hf_and_mlc_chat_examples.py", line 66, in main
    engine = MLCEngine(mlc_path)
             ^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/mlc_llm/serve/engine.py", line 1467, in __init__
    super().__init__(
  File "/opt/anaconda3/lib/python3.11/site-packages/mlc_llm/serve/engine_base.py", line 591, in __init__
    ) = _process_model_args(models, device, engine_config)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/mlc_llm/serve/engine_base.py", line 172, in _process_model_args
    model_args: List[Tuple[str, str]] = [_convert_model_info(model) for model in models]
                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/mlc_llm/serve/engine_base.py", line 172, in <listcomp>
    model_args: List[Tuple[str, str]] = [_convert_model_info(model) for model in models]
                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/mlc_llm/serve/engine_base.py", line 165, in _convert_model_info
    model_lib = jit.jit(
                ^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/mlc_llm/interface/jit.py", line 164, in jit
    _run_jit(
  File "/opt/anaconda3/lib/python3.11/site-packages/mlc_llm/interface/jit.py", line 124, in _run_jit
    raise RuntimeError("Cannot find compilation output, compilation failed")
RuntimeError: Cannot find compilation output, compilation failed
Exception ignored in: <function MLCEngineBase.__del__ at 0x1385af880>
Traceback (most recent call last):
  File "/opt/anaconda3/lib/python3.11/site-packages/mlc_llm/serve/engine_base.py", line 655, in __del__
    self.terminate()
  File "/opt/anaconda3/lib/python3.11/site-packages/mlc_llm/serve/engine_base.py", line 662, in terminate
    self._ffi["exit_background_loop"]()
    ^^^^^^^^^
AttributeError: 'MLCEngine' object has no attribute '_ffi'

To Reproduce

mlc_path = "ed-nt/qwen-2-vp10-q0f16-MLC"
engine = MLCEngine(mlc_path)

Expected behavior

MLCEngine is loaded successfully.

Environment

MasterJH5574 commented 1 day ago

Thank you folks for reporting. We are taking a look.

BoltzmannEntropy commented 1 day ago

Same issue here. Any solution? This does not seem to help: python -m pip install --pre -U -f https://mlc.ai/wheels mlc-llm-cpu==0.17.1 mlc-ai-cpu==0.17.1

MasterJH5574 commented 22 hours ago

Hi @lifelongeeek, as shown in the error message you shared, the real issue in this case is

tvm.error.InternalError: Traceback (most recent call last):
  File "/Users/catalyst/Workspace/mlc-ai-package-self-runner/_work/package/package/tvm/src/tir/transforms/storage_rewrite.cc", line 1494
InternalError: Check failed: (me->coeff == 0 || info.factor() % me->coeff == 0) is false:

Could you please upgrade to the latest nightly package? We fixed it and the issue is likely gone.

MasterJH5574 commented 22 hours ago

@BoltzmannEntropy Could you please try the latest nightly package? You can find the installation instructions at https://llm.mlc.ai/docs/install/mlc_llm.html. Please let us know if the problem persists, thanks.

MasterJH5574 commented 8 hours ago

Closing the issue as it has been fixed. See also #3036