mlc-ai / mlc-llm

Universal LLM Deployment Engine with ML Compilation
https://llm.mlc.ai/
Apache License 2.0
19.08k stars 1.56k forks source link

[Bug] The rwkv-raven models don't produce output tokens on MacBook Air M1. #1130

Closed xxxxyu closed 11 months ago

xxxxyu commented 1 year ago

🐛 Bug

There is no output token when I use mlc-chat-cli to run the compiled rwkv-raven-1b5 and 3b models.

To Reproduce

I followed the instructions here to download and compile the models.

# For 1.5B model
python3 -m mlc_llm.build --hf-path=RWKV/rwkv-raven-1b5 --target metal --quantization q4f16_2
# For 3B model
python3 -m mlc_llm.build --hf-path=RWKV/rwkv-raven-3b --target metal --quantization q4f16_2

Then I ran the model with mlc-chat-cli and got no output token as is shown in the screenshot. There is no error message and the interactive UI is neither blocked nor interrupted. Both the 1.5B and 3B models failed to produce any output token. I haven't tried the 7B version yet.

Screenshot 2023-10-25 at 20 25 32

Environment

anibohara2000 commented 1 year ago

Possibly related to #1136 . Looking into it

anibohara2000 commented 1 year ago

Applying #1136 fixes this.

xxxxyu commented 1 year ago

Update

Tried with the latest commit #1136, got new error: TVMError: Fail to compile metal source:program_source:68:8: error: redefinition of 'take1_kernel_args_t'.

I created a new conda env and re-installed tvm unity. It seems there's something wrong with the TVM backend? Or it could be the chat cli's problem since #1136 only tested the android app. I'm not familiar with TVM. Really appreciate if someone kindly look into this problem.

To Reproduce

python3 -m mlc_llm.build --hf-path=RWKV/rwkv-raven-1b5 --target metal --quantization q4f16_2
mlc_chat_cli --model rwkv-raven-1b5-q4f16_2

Got output:

Use MLC config: "/Users/xyu/Development/llm/mlc-llm/dist/rwkv-raven-1b5-q4f16_2/params/mlc-chat-config.json"
Use model weights: "/Users/xyu/Development/llm/mlc-llm/dist/rwkv-raven-1b5-q4f16_2/params/ndarray-cache.json"
Use model library: "/Users/xyu/Development/llm/mlc-llm/dist/rwkv-raven-1b5-q4f16_2/rwkv-raven-1b5-q4f16_2-metal.so"
You can use the following special commands:
  /help               print the special commands
  /exit               quit the cli
  /stats              print out the latest stats (token/sec)
  /reset              restart a fresh chat
  /reload [model]  reload model `model` from disk, or reload the current model if `model` is not specified

Loading model...
Loading finished
Running system prompts...
[15:52:24] /Users/catalyst/Workspace/miniforge3/envs/mlc-llm-build/conda-bld/mlc-chat-cli-nightly-package_1698786124183/work/3rdparty/tvm/src/runtime/library_module.cc:78: TVMError: Fail to compile metal source:program_source:68:8: error: redefinition of 'take1_kernel_args_t'
struct take1_kernel_args_t {
       ^
program_source:56:8: note: previous definition is here
struct take1_kernel_args_t {
       ^
program_source:72:51: warning: 'buffer' attribute ignored on function declaration [-Wignored-attributes]
kernel void layer_norm_kernel(  device half* A [[ buffer(0) ]],
                                                  ^
... (many lines of similar warnings)

Stack trace:
  File "/Users/catalyst/Workspace/miniforge3/envs/mlc-llm-build/conda-bld/mlc-chat-cli-nightly-package_1698786124183/work/3rdparty/tvm/src/runtime/metal/metal_module.mm", line 109
  [bt] (0) 1   libtvm_runtime.dylib                0x000000010528cf28 tvm::runtime::detail::LogFatal::Entry::Finalize() + 68
  [bt] (1) 2   libtvm_runtime.dylib                0x000000010528cee4 tvm::runtime::detail::LogFatal::Entry::Finalize() + 0
  [bt] (2) 3   libtvm_runtime.dylib                0x0000000105286ee8 __clang_call_terminate + 0
  [bt] (3) 4   libtvm_runtime.dylib                0x00000001053b6218 tvm::runtime::MetalModuleNode::GetPipelineState(unsigned long, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&) + 1616
  [bt] (4) 5   libtvm_runtime.dylib                0x00000001053b4dbc tvm::runtime::MetalWrappedFunc::Init(tvm::runtime::MetalModuleNode*, tvm::runtime::ObjectPtr<tvm::runtime::Object>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, unsigned long, unsigned long, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>> const&) + 224
  [bt] (5) 6   libtvm_runtime.dylib                0x00000001053b27fc tvm::runtime::MetalModuleNode::GetFunction(tvm::runtime::String const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&) + 632
  [bt] (6) 7   libtvm_runtime.dylib                0x00000001052f0b08 tvm::runtime::ModuleNode::GetFunction(tvm::runtime::String const&, bool) + 100
  [bt] (7) 8   libtvm_runtime.dylib                0x00000001052f1534 tvm::runtime::ModuleNode::GetFuncFromEnv(tvm::runtime::String const&) + 244
  [bt] (8) 9   libtvm_runtime.dylib                0x000000010528b18c TVMBackendGetFuncFromEnv + 44

Stack trace:
  [bt] (0) 1   libtvm_runtime.dylib                0x000000010528cf28 tvm::runtime::detail::LogFatal::Entry::Finalize() + 68
  [bt] (1) 2   libtvm_runtime.dylib                0x000000010528cee4 tvm::runtime::detail::LogFatal::Entry::Finalize() + 0
  [bt] (2) 3   libtvm_runtime.dylib                0x0000000105286ee8 __clang_call_terminate + 0
  [bt] (3) 4   libtvm_runtime.dylib                0x00000001052e5de4 tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::WrapPackedFunc(int (*)(TVMValue*, int*, int, TVMValue*, int*, void*), tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::$_0>>::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) + 200
  [bt] (4) 5   libtvm_runtime.dylib                0x0000000105370774 tvm::runtime::relax_vm::VirtualMachineImpl::InvokeClosurePacked(tvm::runtime::ObjectRef const&, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) + 96
  [bt] (5) 6   libtvm_runtime.dylib                0x00000001053726e4 tvm::runtime::relax_vm::VirtualMachineImpl::RunInstrCall(tvm::runtime::relax_vm::VMFrame*, tvm::runtime::relax_vm::Instruction) + 1504
  [bt] (6) 7   libtvm_runtime.dylib                0x0000000105371e0c tvm::runtime::relax_vm::VirtualMachineImpl::RunLoop() + 100
  [bt] (7) 8   libtvm_runtime.dylib                0x0000000105371ab8 tvm::runtime::relax_vm::VirtualMachineImpl::InvokeBytecode(long long, std::__1::vector<tvm::runtime::TVMRetValue, std::__1::allocator<tvm::runtime::TVMRetValue>> const&) + 364
  [bt] (8) 9   libtvm_runtime.dylib                0x0000000105376eb8 tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::relax_vm::VirtualMachineImpl::GetClosureInternal(tvm::runtime::String const&, bool)::$_14>>::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) + 204

Environment

anibohara2000 commented 1 year ago

It works on the latest main branch on my machine. Can you update the tvm submodule before building mlc_chat_cli?

junrushao commented 1 year ago

@MasterJH5574 will this PR on TVM upstream fix the Metal codegen issue?

junrushao commented 11 months ago

It should be fixed by the latest nightly pip wheel. Could you guys confirm? Thanks!

xxxxyu commented 11 months ago

Sry, been busy with other stuffs, just tried again and failed. @junrushao This time it should be some dependency issue rather than compiling, so I opened a new issue #1247.

xxxxyu commented 11 months ago

Working with the latest wheel, all problem solved, thx.