xxxxyu commented 1 year ago

🐛 Bug

There is no output token when I use mlc-chat-cli to run the compiled rwkv-raven-1b5 and 3b models.

To Reproduce

I followed the instructions here to download and compile the models.

# For 1.5B model
python3 -m mlc_llm.build --hf-path=RWKV/rwkv-raven-1b5 --target metal --quantization q4f16_2
# For 3B model
python3 -m mlc_llm.build --hf-path=RWKV/rwkv-raven-3b --target metal --quantization q4f16_2

Then I ran the model with mlc-chat-cli and got no output token as is shown in the screenshot. There is no error message and the interactive UI is neither blocked nor interrupted. Both the 1.5B and 3B models failed to produce any output token. I haven't tried the 7B version yet.

Environment

Platform (e.g. WebGPU/Vulkan/IOS/Android/CUDA): Metal
Operating system (e.g. Ubuntu/Windows/MacOS/...): MacOS
Device (e.g. iPhone 12 Pro, PC+RTX 3090, ...): MacBook Air M1
How you installed MLC-LLM (conda, source): source
How you installed TVM-Unity (pip, source): conda
Python version (e.g. 3.10): 3.9.6
GPU driver version (if applicable):
CUDA/cuDNN version (if applicable):

TVM Unity Hash Tag (python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))", applicable if you compile models):

USE_NVTX: OFF
USE_GTEST: AUTO
SUMMARIZE: OFF
USE_IOS_RPC: OFF
USE_MSC: OFF
USE_ETHOSU: 
CUDA_VERSION: NOT-FOUND
USE_LIBBACKTRACE: AUTO
DLPACK_PATH: 3rdparty/dlpack/include
USE_TENSORRT_CODEGEN: OFF
USE_THRUST: OFF
USE_TARGET_ONNX: OFF
USE_AOT_EXECUTOR: ON
BUILD_DUMMY_LIBTVM: OFF
USE_CUDNN: OFF
USE_TENSORRT_RUNTIME: OFF
USE_ARM_COMPUTE_LIB_GRAPH_EXECUTOR: OFF
USE_CCACHE: AUTO
USE_ARM_COMPUTE_LIB: OFF
USE_CPP_RTVM: 
USE_OPENCL_GTEST: /path/to/opencl/gtest
USE_MKL: OFF
USE_PT_TVMDSOOP: OFF
MLIR_VERSION: NOT-FOUND
USE_CLML: OFF
USE_STACKVM_RUNTIME: OFF
USE_GRAPH_EXECUTOR_CUDA_GRAPH: OFF
ROCM_PATH: /opt/rocm
USE_DNNL: OFF
USE_VITIS_AI: OFF
USE_MLIR: OFF
USE_RCCL: OFF
USE_LLVM: llvm-config --link-static
USE_VERILATOR: OFF
USE_TF_TVMDSOOP: OFF
USE_THREADS: ON
USE_MSVC_MT: OFF
BACKTRACE_ON_SEGFAULT: OFF
USE_GRAPH_EXECUTOR: ON
USE_NCCL: OFF
USE_ROCBLAS: OFF
GIT_COMMIT_HASH: 30b4fa3c13fc80d5c9151a9dc445d22c57ced3e0
USE_VULKAN: OFF
USE_RUST_EXT: OFF
USE_CUTLASS: OFF
USE_CPP_RPC: OFF
USE_HEXAGON: OFF
USE_CUSTOM_LOGGING: OFF
USE_UMA: OFF
USE_FALLBACK_STL_MAP: OFF
USE_SORT: ON
USE_RTTI: ON
GIT_COMMIT_TIME: 2023-10-17 21:33:54 -0700
USE_HEXAGON_SDK: /path/to/sdk
USE_BLAS: none
USE_ETHOSN: OFF
USE_LIBTORCH: OFF
USE_RANDOM: ON
USE_CUDA: OFF
USE_COREML: OFF
USE_AMX: OFF
BUILD_STATIC_RUNTIME: OFF
USE_CMSISNN: OFF
USE_KHRONOS_SPIRV: OFF
USE_CLML_GRAPH_EXECUTOR: OFF
USE_TFLITE: OFF
USE_HEXAGON_GTEST: /path/to/hexagon/gtest
PICOJSON_PATH: 3rdparty/picojson
USE_OPENCL_ENABLE_HOST_PTR: OFF
INSTALL_DEV: OFF
USE_PROFILER: ON
USE_NNPACK: OFF
LLVM_VERSION: 15.0.7
USE_OPENCL: OFF
COMPILER_RT_PATH: 3rdparty/compiler-rt
RANG_PATH: 3rdparty/rang/include
USE_SPIRV_KHR_INTEGER_DOT_PRODUCT: OFF
USE_OPENMP: OFF
USE_BNNS: OFF
USE_CUBLAS: OFF
USE_METAL: ON
USE_MICRO_STANDALONE_RUNTIME: OFF
USE_HEXAGON_EXTERNAL_LIBS: OFF
USE_ALTERNATIVE_LINKER: AUTO
USE_BYODT_POSIT: OFF
USE_HEXAGON_RPC: OFF
USE_MICRO: OFF
DMLC_PATH: 3rdparty/dmlc-core/include
INDEX_DEFAULT_I64: ON
USE_RELAY_DEBUG: OFF
USE_RPC: ON
USE_TENSORFLOW_PATH: none
TVM_CLML_VERSION: 
USE_MIOPEN: OFF
USE_ROCM: OFF
USE_PAPI: OFF
USE_CURAND: OFF
TVM_CXX_COMPILER_PATH: /Library/Developer/CommandLineTools/usr/bin/c++
HIDE_PRIVATE_SYMBOLS: ON

anibohara2000 commented 1 year ago

Possibly related to #1136 . Looking into it

anibohara2000 commented 1 year ago

Applying #1136 fixes this.

xxxxyu commented 1 year ago

Update

Tried with the latest commit #1136, got new error: TVMError: Fail to compile metal source:program_source:68:8: error: redefinition of 'take1_kernel_args_t'.

I created a new conda env and re-installed tvm unity. It seems there's something wrong with the TVM backend? Or it could be the chat cli's problem since #1136 only tested the android app. I'm not familiar with TVM. Really appreciate if someone kindly look into this problem.

To Reproduce

python3 -m mlc_llm.build --hf-path=RWKV/rwkv-raven-1b5 --target metal --quantization q4f16_2
mlc_chat_cli --model rwkv-raven-1b5-q4f16_2

Got output:

Use MLC config: "/Users/xyu/Development/llm/mlc-llm/dist/rwkv-raven-1b5-q4f16_2/params/mlc-chat-config.json"
Use model weights: "/Users/xyu/Development/llm/mlc-llm/dist/rwkv-raven-1b5-q4f16_2/params/ndarray-cache.json"
Use model library: "/Users/xyu/Development/llm/mlc-llm/dist/rwkv-raven-1b5-q4f16_2/rwkv-raven-1b5-q4f16_2-metal.so"
You can use the following special commands:
  /help               print the special commands
  /exit               quit the cli
  /stats              print out the latest stats (token/sec)
  /reset              restart a fresh chat
  /reload [model]  reload model `model` from disk, or reload the current model if `model` is not specified

Loading model...
Loading finished
Running system prompts...
[15:52:24] /Users/catalyst/Workspace/miniforge3/envs/mlc-llm-build/conda-bld/mlc-chat-cli-nightly-package_1698786124183/work/3rdparty/tvm/src/runtime/library_module.cc:78: TVMError: Fail to compile metal source:program_source:68:8: error: redefinition of 'take1_kernel_args_t'
struct take1_kernel_args_t {
       ^
program_source:56:8: note: previous definition is here
struct take1_kernel_args_t {
       ^
program_source:72:51: warning: 'buffer' attribute ignored on function declaration [-Wignored-attributes]
kernel void layer_norm_kernel(  device half* A [[ buffer(0) ]],
                                                  ^
... (many lines of similar warnings)

Stack trace:
  File "/Users/catalyst/Workspace/miniforge3/envs/mlc-llm-build/conda-bld/mlc-chat-cli-nightly-package_1698786124183/work/3rdparty/tvm/src/runtime/metal/metal_module.mm", line 109
  [bt] (0) 1   libtvm_runtime.dylib                0x000000010528cf28 tvm::runtime::detail::LogFatal::Entry::Finalize() + 68
  [bt] (1) 2   libtvm_runtime.dylib                0x000000010528cee4 tvm::runtime::detail::LogFatal::Entry::Finalize() + 0
  [bt] (2) 3   libtvm_runtime.dylib                0x0000000105286ee8 __clang_call_terminate + 0
  [bt] (3) 4   libtvm_runtime.dylib                0x00000001053b6218 tvm::runtime::MetalModuleNode::GetPipelineState(unsigned long, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&) + 1616
  [bt] (4) 5   libtvm_runtime.dylib                0x00000001053b4dbc tvm::runtime::MetalWrappedFunc::Init(tvm::runtime::MetalModuleNode*, tvm::runtime::ObjectPtr<tvm::runtime::Object>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, unsigned long, unsigned long, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>> const&) + 224
  [bt] (5) 6   libtvm_runtime.dylib                0x00000001053b27fc tvm::runtime::MetalModuleNode::GetFunction(tvm::runtime::String const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&) + 632
  [bt] (6) 7   libtvm_runtime.dylib                0x00000001052f0b08 tvm::runtime::ModuleNode::GetFunction(tvm::runtime::String const&, bool) + 100
  [bt] (7) 8   libtvm_runtime.dylib                0x00000001052f1534 tvm::runtime::ModuleNode::GetFuncFromEnv(tvm::runtime::String const&) + 244
  [bt] (8) 9   libtvm_runtime.dylib                0x000000010528b18c TVMBackendGetFuncFromEnv + 44

Stack trace:
  [bt] (0) 1   libtvm_runtime.dylib                0x000000010528cf28 tvm::runtime::detail::LogFatal::Entry::Finalize() + 68
  [bt] (1) 2   libtvm_runtime.dylib                0x000000010528cee4 tvm::runtime::detail::LogFatal::Entry::Finalize() + 0
  [bt] (2) 3   libtvm_runtime.dylib                0x0000000105286ee8 __clang_call_terminate + 0
  [bt] (3) 4   libtvm_runtime.dylib                0x00000001052e5de4 tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::WrapPackedFunc(int (*)(TVMValue*, int*, int, TVMValue*, int*, void*), tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::$_0>>::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) + 200
  [bt] (4) 5   libtvm_runtime.dylib                0x0000000105370774 tvm::runtime::relax_vm::VirtualMachineImpl::InvokeClosurePacked(tvm::runtime::ObjectRef const&, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) + 96
  [bt] (5) 6   libtvm_runtime.dylib                0x00000001053726e4 tvm::runtime::relax_vm::VirtualMachineImpl::RunInstrCall(tvm::runtime::relax_vm::VMFrame*, tvm::runtime::relax_vm::Instruction) + 1504
  [bt] (6) 7   libtvm_runtime.dylib                0x0000000105371e0c tvm::runtime::relax_vm::VirtualMachineImpl::RunLoop() + 100
  [bt] (7) 8   libtvm_runtime.dylib                0x0000000105371ab8 tvm::runtime::relax_vm::VirtualMachineImpl::InvokeBytecode(long long, std::__1::vector<tvm::runtime::TVMRetValue, std::__1::allocator<tvm::runtime::TVMRetValue>> const&) + 364
  [bt] (8) 9   libtvm_runtime.dylib                0x0000000105376eb8 tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::relax_vm::VirtualMachineImpl::GetClosureInternal(tvm::runtime::String const&, bool)::$_14>>::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) + 204

Environment

Platform (e.g. WebGPU/Vulkan/IOS/Android/CUDA): Metal
Operating system (e.g. Ubuntu/Windows/MacOS/...): MacOS
Device (e.g. iPhone 12 Pro, PC+RTX 3090, ...): MacBook Air M1
How you installed MLC-LLM (conda, source): source
How you installed TVM-Unity (pip, source): conda
Python version (e.g. 3.10): 3.9.6
GPU driver version (if applicable):
CUDA/cuDNN version (if applicable):

TVM Unity Hash Tag (python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))", applicable if you compile models):

USE_NVTX: OFF
USE_GTEST: AUTO
SUMMARIZE: OFF
USE_IOS_RPC: OFF
USE_MSC: OFF
USE_ETHOSU: 
CUDA_VERSION: NOT-FOUND
USE_LIBBACKTRACE: AUTO
DLPACK_PATH: 3rdparty/dlpack/include
USE_TENSORRT_CODEGEN: OFF
USE_THRUST: OFF
USE_TARGET_ONNX: OFF
USE_AOT_EXECUTOR: ON
BUILD_DUMMY_LIBTVM: OFF
USE_CUDNN: OFF
USE_TENSORRT_RUNTIME: OFF
USE_ARM_COMPUTE_LIB_GRAPH_EXECUTOR: OFF
USE_CCACHE: AUTO
USE_ARM_COMPUTE_LIB: OFF
USE_CPP_RTVM: 
USE_OPENCL_GTEST: /path/to/opencl/gtest
USE_MKL: OFF
USE_PT_TVMDSOOP: OFF
MLIR_VERSION: NOT-FOUND
USE_CLML: OFF
USE_STACKVM_RUNTIME: OFF
USE_GRAPH_EXECUTOR_CUDA_GRAPH: OFF
ROCM_PATH: /opt/rocm
USE_DNNL: OFF
USE_VITIS_AI: OFF
USE_MLIR: OFF
USE_RCCL: OFF
USE_LLVM: llvm-config --link-static
USE_VERILATOR: OFF
USE_TF_TVMDSOOP: OFF
USE_THREADS: ON
USE_MSVC_MT: OFF
BACKTRACE_ON_SEGFAULT: OFF
USE_GRAPH_EXECUTOR: ON
USE_NCCL: OFF
USE_ROCBLAS: OFF
GIT_COMMIT_HASH: 3183686672bf08316e404dfe915978653709809a
USE_VULKAN: OFF
USE_RUST_EXT: OFF
USE_CUTLASS: OFF
USE_CPP_RPC: OFF
USE_HEXAGON: OFF
USE_CUSTOM_LOGGING: OFF
USE_UMA: OFF
USE_FALLBACK_STL_MAP: OFF
USE_SORT: ON
USE_RTTI: ON
GIT_COMMIT_TIME: 2023-10-31 13:58:33 -0700
USE_HEXAGON_SDK: /path/to/sdk
USE_BLAS: none
USE_ETHOSN: OFF
USE_LIBTORCH: OFF
USE_RANDOM: ON
USE_CUDA: OFF
USE_COREML: OFF
USE_AMX: OFF
BUILD_STATIC_RUNTIME: OFF
USE_CMSISNN: OFF
USE_KHRONOS_SPIRV: OFF
USE_CLML_GRAPH_EXECUTOR: OFF
USE_TFLITE: OFF
USE_HEXAGON_GTEST: /path/to/hexagon/gtest
PICOJSON_PATH: 3rdparty/picojson
USE_OPENCL_ENABLE_HOST_PTR: OFF
INSTALL_DEV: OFF
USE_PROFILER: ON
USE_NNPACK: OFF
LLVM_VERSION: 15.0.7
USE_OPENCL: OFF
COMPILER_RT_PATH: 3rdparty/compiler-rt
RANG_PATH: 3rdparty/rang/include
USE_SPIRV_KHR_INTEGER_DOT_PRODUCT: OFF
USE_OPENMP: OFF
USE_BNNS: OFF
USE_CUBLAS: OFF
USE_METAL: ON
USE_MICRO_STANDALONE_RUNTIME: OFF
USE_HEXAGON_EXTERNAL_LIBS: OFF
USE_ALTERNATIVE_LINKER: AUTO
USE_BYODT_POSIT: OFF
USE_HEXAGON_RPC: OFF
USE_MICRO: OFF
DMLC_PATH: 3rdparty/dmlc-core/include
INDEX_DEFAULT_I64: ON
USE_RELAY_DEBUG: OFF
USE_RPC: ON
USE_TENSORFLOW_PATH: none
TVM_CLML_VERSION: 
USE_MIOPEN: OFF
USE_ROCM: OFF
USE_PAPI: OFF
USE_CURAND: OFF
TVM_CXX_COMPILER_PATH: /Library/Developer/CommandLineTools/usr/bin/c++
HIDE_PRIVATE_SYMBOLS: ON

anibohara2000 commented 1 year ago

It works on the latest main branch on my machine. Can you update the tvm submodule before building mlc_chat_cli?

junrushao commented 1 year ago

@MasterJH5574 will this PR on TVM upstream fix the Metal codegen issue?

junrushao commented 11 months ago

It should be fixed by the latest nightly pip wheel. Could you guys confirm? Thanks!

xxxxyu commented 11 months ago

Sry, been busy with other stuffs, just tried again and failed. @junrushao This time it should be some dependency issue rather than compiling, so I opened a new issue #1247.

xxxxyu commented 11 months ago

Working with the latest wheel, all problem solved, thx.

mlc-ai / mlc-llm

[Bug] The rwkv-raven models don't produce output tokens on MacBook Air M1. #1130

🐛 Bug

To Reproduce

Environment

Update

To Reproduce

Environment