[Bug] macbook pro m4 max apple silicon mlc_llm compile qwen2.5 q4f32 mlc .so error

🐛 Bug

To Reproduce

Steps to reproduce the behavior:

mlc_llm compile /path/to/models/qwen/Qwen2.5-32B-Instruct-q4f32_1-MLC/mlc-chat-config.json \ -o /path/to/models/qwen/Qwen2.5-32B-Instruct-q4f32_1-MLC/libs/Qwen2.5-32B-Instruct-q4f32_1-MLC-metal.so

Expected behavior

Environment

Platform: arm64
Operating system: MacOS 15.1.1
Device: metal
How you installed MLC-LLM : pip, source
How you installed TVM-Unity: pip, source
Python version: 3.11
TVM Unity Hash Tag: python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))" [11:43:36] /Users/lyn/Documents/python/learn/mlc_llm/modules/tvm-unity/src/target/llvm/llvm_instance.cc:226: Error: Using LLVM 19.1.4 with -mcpu=apple-latest is not valid in -mtriple=arm64-apple-macos, using default -mcpu=generic [11:43:36] /Users/lyn/Documents/python/learn/mlc_llm/modules/tvm-unity/src/target/llvm/llvm_instance.cc:226: Error: Using LLVM 19.1.4 with -mcpu=apple-latest is not valid in -mtriple=arm64-apple-macos, using default -mcpu=generic [11:43:36] /Users/lyn/Documents/python/learn/mlc_llm/modules/tvm-unity/src/target/llvm/llvm_instance.cc:226: Error: Using LLVM 19.1.4 with -mcpu=apple-latest is not valid in -mtriple=arm64-apple-macos, using default -mcpu=generic CLML Target Version: 3 USE_NVTX: OFF USE_GTEST: AUTO SUMMARIZE: OFF TVM_DEBUG_WITH_ABI_CHANGE: OFF USE_IOS_RPC: OFF USE_MSC: OFF USE_ETHOSU: OFF CUDA_VERSION: NOT-FOUND USE_LIBBACKTRACE: AUTO DLPACK_PATH: 3rdparty/dlpack/include USE_TENSORRT_CODEGEN: OFF USE_OPENCL_EXTN_QCOM: NOT-FOUND USE_TARGET_ONNX: OFF USE_AOT_EXECUTOR: ON BUILD_DUMMY_LIBTVM: OFF USE_CUDNN: OFF USE_TENSORRT_RUNTIME: OFF USE_ARM_COMPUTE_LIB_GRAPH_EXECUTOR: OFF USE_THRUST: OFF USE_CCACHE: AUTO USE_ARM_COMPUTE_LIB: OFF USE_CPP_RTVM: OFF USE_OPENCL_GTEST: /path/to/opencl/gtest TVM_LOG_BEFORE_THROW: OFF USE_MKL: OFF USE_PT_TVMDSOOP: OFF MLIR_VERSION: NOT-FOUND USE_CLML: OFF USE_STACKVM_RUNTIME: OFF USE_GRAPH_EXECUTOR_CUDA_GRAPH: OFF ROCM_PATH: /opt/rocm USE_DNNL: OFF USE_MSCCL: OFF USE_NNAPI_RUNTIME: OFF USE_VITIS_AI: OFF USE_MLIR: OFF USE_RCCL: OFF USE_LLVM: llvm-config USE_VERILATOR: OFF USE_TF_TVMDSOOP: OFF USE_THREADS: ON USE_MSVC_MT: OFF BACKTRACE_ON_SEGFAULT: OFF USE_GRAPH_EXECUTOR: ON USE_NCCL: OFF USE_ROCBLAS: OFF GIT_COMMIT_HASH: 30f97b0df3a0078ac5e6be1e8ad50eadcc2dff43 USE_VULKAN: OFF USE_RUST_EXT: OFF USE_CUTLASS: OFF USE_CPP_RPC: OFF USE_HEXAGON: OFF USE_CUSTOM_LOGGING: OFF USE_UMA: OFF USE_FALLBACK_STL_MAP: OFF USE_SORT: ON USE_RTTI: ON GIT_COMMIT_TIME: 2024-11-15 11:16:12 -0500 USE_HIPBLAS: OFF USE_HEXAGON_SDK: /path/to/sdk USE_BLAS: none USE_ETHOSN: OFF USE_LIBTORCH: OFF USE_RANDOM: ON USE_CUDA: OFF USE_COREML: OFF USE_AMX: OFF BUILD_STATIC_RUNTIME: OFF USE_CMSISNN: OFF USE_KHRONOS_SPIRV: OFF USE_CLML_GRAPH_EXECUTOR: OFF USE_TFLITE: OFF USE_HEXAGON_GTEST: /path/to/hexagon/gtest PICOJSON_PATH: 3rdparty/picojson USE_OPENCL_ENABLE_HOST_PTR: OFF INSTALL_DEV: OFF USE_PROFILER: ON USE_NNPACK: OFF LLVM_VERSION: 19.1.4 USE_MRVL: OFF USE_OPENCL: OFF COMPILER_RT_PATH: 3rdparty/compiler-rt USE_NNAPI_CODEGEN: OFF RANG_PATH: 3rdparty/rang/include USE_SPIRV_KHR_INTEGER_DOT_PRODUCT: OFF USE_OPENMP: none USE_BNNS: OFF USE_FLASHINFER: OFF USE_CUBLAS: OFF USE_METAL: ON USE_MICRO_STANDALONE_RUNTIME: OFF USE_HEXAGON_EXTERNAL_LIBS: OFF USE_ALTERNATIVE_LINKER: AUTO USE_BYODT_POSIT: OFF USE_NVSHMEM: OFF USE_HEXAGON_RPC: OFF USE_MICRO: OFF DMLC_PATH: 3rdparty/dmlc-core/include INDEX_DEFAULT_I64: ON USE_RELAY_DEBUG: OFF USE_RPC: ON USE_TENSORFLOW_PATH: none TVM_CLML_VERSION: USE_MIOPEN: OFF USE_ROCM: OFF USE_PAPI: OFF USE_CURAND: OFF TVM_CXX_COMPILER_PATH: /Library/Developer/CommandLineTools/usr/bin/c++ HIDE_PRIVATE_SYMBOLS: ON
Any other relevant information:

Additional context

[11:34:19] /Users/lyn/Documents/python/learn/mlc_llm/modules/tvm-unity/src/target/llvm/llvm_instance.cc:226: Error: Using LLVM 19.1.4 with -mcpu=apple-latest is not valid in -mtriple=arm64-apple-macos, using default -mcpu=generic [11:34:19] /Users/lyn/Documents/python/learn/mlc_llm/modules/tvm-unity/src/target/llvm/llvm_instance.cc:226: Error: Using LLVM 19.1.4 with -mcpu=apple-latest is not valid in -mtriple=arm64-apple-macos, using default -mcpu=generic [11:34:19] /Users/lyn/Documents/python/learn/mlc_llm/modules/tvm-unity/src/target/llvm/llvm_instance.cc:226: Error: Using LLVM 19.1.4 with -mcpu=apple-latest is not valid in -mtriple=arm64-apple-macos, using default -mcpu=generic CLML Target Version: 3 [2024-11-20 11:34:19] INFO auto_config.py:70: Found model configuration: /Users/lyn/Documents/python/learn/mlc_llm/models/qwen/Qwen2.5-32B-Instruct-q4f32_1-MLC/mlc-chat-config.json [2024-11-20 11:34:20] INFO auto_device.py:88: Not found device: cuda:0 [2024-11-20 11:34:20] INFO auto_device.py:88: Not found device: rocm:0 [2024-11-20 11:34:21] INFO auto_device.py:79: Found device: metal:0 [2024-11-20 11:34:21] INFO auto_device.py:88: Not found device: vulkan:0 [2024-11-20 11:34:22] INFO auto_device.py:88: Not found device: opencl:0 [2024-11-20 11:34:22] INFO auto_device.py:35: Using device: metal:0 [2024-11-20 11:34:22] INFO auto_target.py:78: Found configuration of target device "metal:0": {"thread_warp_size": runtime.BoxInt(32), "max_threads_per_block": runtime.BoxInt(1024), "max_function_args": runtime.BoxInt(31), "max_num_threads": runtime.BoxInt(256), "kind": "metal", "max_shared_memory_per_block": runtime.BoxInt(32768), "tag": "", "keys": ["metal", "gpu"]} [11:34:22] /Users/lyn/Documents/python/learn/mlc_llm/modules/tvm-unity/src/target/llvm/llvm_instance.cc:226: Error: Using LLVM 19.1.4 with -mcpu=apple-m3 is not valid in -mtriple=arm64-apple-darwin24.1.0, using default -mcpu=generic [2024-11-20 11:34:22] INFO auto_target.py:110: Found host LLVM triple: arm64-apple-darwin24.1.0 [2024-11-20 11:34:22] INFO auto_target.py:111: Found host LLVM CPU: apple-m3 [2024-11-20 11:34:22] INFO auto_config.py:154: Found model type: qwen2. Use --model-type to override. Compiling with arguments: --config QWen2Config(hidden_act='silu', hidden_size=5120, intermediate_size=27648, num_attention_heads=40, num_hidden_layers=64, num_key_value_heads=8, rms_norm_eps=1e-06, rope_theta=1000000.0, vocab_size=152064, tie_word_embeddings=False, context_window_size=32768, prefill_chunk_size=2048, tensor_parallel_shards=1, head_dim=128, dtype='float32', max_batch_size=80, kwargs={}) --quantization GroupQuantize(name='q4f32_1', kind='group-quant', group_size=32, quantize_dtype='int4', storage_dtype='uint32', model_dtype='float32', linear_weight_layout='NK', quantize_embedding=True, quantize_final_fc=True, num_elem_per_storage=8, num_storage_per_group=4, max_int_value=7, tensor_parallel_shards=0) --model-type qwen2 --target {"thread_warp_size": runtime.BoxInt(32), "host": {"mtriple": "arm64-apple-darwin24.1.0", "tag": "", "kind": "llvm", "mcpu": "apple-m3", "keys": ["arm_cpu", "cpu"]}, "max_threads_per_block": runtime.BoxInt(1024), "max_function_args": runtime.BoxInt(31), "max_num_threads": runtime.BoxInt(256), "kind": "metal", "max_shared_memory_per_block": runtime.BoxInt(32768), "tag": "", "keys": ["metal", "gpu"]} --opt flashinfer=0;cublas_gemm=0;faster_transformer=0;cudagraph=0;cutlass=0;ipc_allreduce_strategy=NONE --system-lib-prefix "" --output /Users/lyn/Documents/python/learn/mlc_llm/models/qwen/Qwen2.5-32B-Instruct-q4f32_1-MLC/libs/Qwen2.5-32B-Instruct-q4f32_1-MLC-metal.so --overrides context_window_size=None;sliding_window_size=None;prefill_chunk_size=None;attention_sink_size=None;max_batch_size=None;tensor_parallel_shards=None;pipeline_parallel_stages=None [2024-11-20 11:34:22] INFO compile.py:140: Creating model from: QWen2Config(hidden_act='silu', hidden_size=5120, intermediate_size=27648, num_attention_heads=40, num_hidden_layers=64, num_key_value_heads=8, rms_norm_eps=1e-06, rope_theta=1000000.0, vocab_size=152064, tie_word_embeddings=False, context_window_size=32768, prefill_chunk_size=2048, tensor_parallel_shards=1, head_dim=128, dtype='float32', max_batch_size=80, kwargs={}) [2024-11-20 11:34:22] INFO compile.py:158: Exporting the model to TVM Unity compiler [2024-11-20 11:34:24] INFO compile.py:164: Running optimizations using TVM Unity [2024-11-20 11:34:24] INFO compile.py:185: Registering metadata: {'model_type': 'qwen2', 'quantization': 'q4f32_1', 'context_window_size': 32768, 'sliding_window_size': -1, 'attention_sink_size': -1, 'prefill_chunk_size': 2048, 'tensor_parallel_shards': 1, 'pipeline_parallel_stages': 1, 'kv_state_kind': 'kv_cache', 'max_batch_size': 80} [2024-11-20 11:34:25] INFO pipeline.py:54: Running TVM Relax graph-level optimizations [2024-11-20 11:34:27] INFO pipeline.py:54: Lowering to TVM TIR kernels [2024-11-20 11:34:31] INFO pipeline.py:54: Running TVM TIR-level optimizations [2024-11-20 11:34:39] INFO pipeline.py:54: Running TVM Dlight low-level optimizations [2024-11-20 11:34:44] INFO pipeline.py:54: Lowering to VM bytecode [2024-11-20 11:34:45] INFO estimate_memory_usage.py:58: [Memory usage] Function alloc_embedding_tensor: 40.00 MB [2024-11-20 11:34:45] INFO estimate_memory_usage.py:58: [Memory usage] Function batch_decode: 80.16 MB [2024-11-20 11:34:45] INFO estimate_memory_usage.py:58: [Memory usage] Function batch_prefill: 911.97 MB [2024-11-20 11:34:45] INFO estimate_memory_usage.py:58: [Memory usage] Function batch_verify: 2052.00 MB [2024-11-20 11:34:45] INFO estimate_memory_usage.py:58: [Memory usage] Function create_tir_paged_kv_cache: 0.00 MB [2024-11-20 11:34:45] INFO estimate_memory_usage.py:58: [Memory usage] Function decode: 1.00 MB [2024-11-20 11:34:45] INFO estimate_memory_usage.py:58: [Memory usage] Function embed: 40.00 MB [2024-11-20 11:34:45] INFO estimate_memory_usage.py:58: [Memory usage] Function prefill: 864.60 MB [2024-11-20 11:34:45] INFO estimate_memory_usage.py:58: [Memory usage] Function softmax_with_temperature: 0.00 MB [2024-11-20 11:34:46] INFO pipeline.py:54: Compiling external modules [2024-11-20 11:34:46] INFO pipeline.py:54: Compilation complete! Exporting to disk Traceback (most recent call last): File "/Users/lyn/Applications/miniforge3/envs/mlc_llm_dev/bin/mlc_llm", line 33, in sys.exit(load_entry_point('mlc-llm', 'console_scripts', 'mlc_llm')()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/lyn/Documents/python/learn/mlc_llm/modules/mlc-llm/python/mlc_llm/main.py", line 33, in main cli.main(sys.argv[2:]) File "/Users/lyn/Documents/python/learn/mlc_llm/modules/mlc-llm/python/mlc_llm/cli/compile.py", line 129, in main compile( File "/Users/lyn/Documents/python/learn/mlc_llm/modules/mlc-llm/python/mlc_llm/interface/compile.py", line 243, in compile _compile(args, model_config) File "/Users/lyn/Documents/python/learn/mlc_llm/modules/mlc-llm/python/mlc_llm/interface/compile.py", line 188, in _compile args.build_func( File "/Users/lyn/Documents/python/learn/mlc_llm/modules/mlc-llm/python/mlc_llm/support/auto_target.py", line 301, in build relax.build( File "/Users/lyn/Documents/python/learn/mlc_llm/modules/tvm-unity/python/tvm/relax/vm_build.py", line 353, in build return _vmlink( ^^^^^^^^ File "/Users/lyn/Documents/python/learn/mlc_llm/modules/tvm-unity/python/tvm/relax/vm_build.py", line 249, in _vmlink lib = tvm.build( ^^^^^^^^^^ File "/Users/lyn/Documents/python/learn/mlc_llm/modules/tvm-unity/python/tvm/driver/build_module.py", line 297, in build rt_mod_host = _driver_ffi.tir_to_runtime(annotated_mods, target_host) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/lyn/Documents/python/learn/mlc_llm/modules/tvm-unity/python/tvm/_ffi/_ctypes/packed_func.py", line 245, in call raise_last_ffi_error() File "/Users/lyn/Documents/python/learn/mlc_llm/modules/tvm-unity/python/tvm/_ffi/base.py", line 481, in raise_last_ffi_error raise py_err tvm.error.InternalError: Traceback (most recent call last): File "/Users/lyn/Documents/python/learn/mlc_llm/modules/tvm-unity/src/tir/transforms/storage_rewrite.cc", line 1494 InternalError: Check failed: (me->coeff == 0 || info.factor() % me->coeff == 0) is false:

mlc-ai / mlc-llm