Closed xxxxyu closed 11 months ago
Possibly related to #1136 . Looking into it
Applying #1136 fixes this.
Tried with the latest commit #1136, got new error: TVMError: Fail to compile metal source:program_source:68:8: error: redefinition of 'take1_kernel_args_t'
.
I created a new conda env and re-installed tvm unity. It seems there's something wrong with the TVM backend? Or it could be the chat cli's problem since #1136 only tested the android app. I'm not familiar with TVM. Really appreciate if someone kindly look into this problem.
python3 -m mlc_llm.build --hf-path=RWKV/rwkv-raven-1b5 --target metal --quantization q4f16_2
mlc_chat_cli --model rwkv-raven-1b5-q4f16_2
Got output:
Use MLC config: "/Users/xyu/Development/llm/mlc-llm/dist/rwkv-raven-1b5-q4f16_2/params/mlc-chat-config.json"
Use model weights: "/Users/xyu/Development/llm/mlc-llm/dist/rwkv-raven-1b5-q4f16_2/params/ndarray-cache.json"
Use model library: "/Users/xyu/Development/llm/mlc-llm/dist/rwkv-raven-1b5-q4f16_2/rwkv-raven-1b5-q4f16_2-metal.so"
You can use the following special commands:
/help print the special commands
/exit quit the cli
/stats print out the latest stats (token/sec)
/reset restart a fresh chat
/reload [model] reload model `model` from disk, or reload the current model if `model` is not specified
Loading model...
Loading finished
Running system prompts...
[15:52:24] /Users/catalyst/Workspace/miniforge3/envs/mlc-llm-build/conda-bld/mlc-chat-cli-nightly-package_1698786124183/work/3rdparty/tvm/src/runtime/library_module.cc:78: TVMError: Fail to compile metal source:program_source:68:8: error: redefinition of 'take1_kernel_args_t'
struct take1_kernel_args_t {
^
program_source:56:8: note: previous definition is here
struct take1_kernel_args_t {
^
program_source:72:51: warning: 'buffer' attribute ignored on function declaration [-Wignored-attributes]
kernel void layer_norm_kernel( device half* A [[ buffer(0) ]],
^
... (many lines of similar warnings)
Stack trace:
File "/Users/catalyst/Workspace/miniforge3/envs/mlc-llm-build/conda-bld/mlc-chat-cli-nightly-package_1698786124183/work/3rdparty/tvm/src/runtime/metal/metal_module.mm", line 109
[bt] (0) 1 libtvm_runtime.dylib 0x000000010528cf28 tvm::runtime::detail::LogFatal::Entry::Finalize() + 68
[bt] (1) 2 libtvm_runtime.dylib 0x000000010528cee4 tvm::runtime::detail::LogFatal::Entry::Finalize() + 0
[bt] (2) 3 libtvm_runtime.dylib 0x0000000105286ee8 __clang_call_terminate + 0
[bt] (3) 4 libtvm_runtime.dylib 0x00000001053b6218 tvm::runtime::MetalModuleNode::GetPipelineState(unsigned long, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&) + 1616
[bt] (4) 5 libtvm_runtime.dylib 0x00000001053b4dbc tvm::runtime::MetalWrappedFunc::Init(tvm::runtime::MetalModuleNode*, tvm::runtime::ObjectPtr<tvm::runtime::Object>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, unsigned long, unsigned long, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>> const&) + 224
[bt] (5) 6 libtvm_runtime.dylib 0x00000001053b27fc tvm::runtime::MetalModuleNode::GetFunction(tvm::runtime::String const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&) + 632
[bt] (6) 7 libtvm_runtime.dylib 0x00000001052f0b08 tvm::runtime::ModuleNode::GetFunction(tvm::runtime::String const&, bool) + 100
[bt] (7) 8 libtvm_runtime.dylib 0x00000001052f1534 tvm::runtime::ModuleNode::GetFuncFromEnv(tvm::runtime::String const&) + 244
[bt] (8) 9 libtvm_runtime.dylib 0x000000010528b18c TVMBackendGetFuncFromEnv + 44
Stack trace:
[bt] (0) 1 libtvm_runtime.dylib 0x000000010528cf28 tvm::runtime::detail::LogFatal::Entry::Finalize() + 68
[bt] (1) 2 libtvm_runtime.dylib 0x000000010528cee4 tvm::runtime::detail::LogFatal::Entry::Finalize() + 0
[bt] (2) 3 libtvm_runtime.dylib 0x0000000105286ee8 __clang_call_terminate + 0
[bt] (3) 4 libtvm_runtime.dylib 0x00000001052e5de4 tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::WrapPackedFunc(int (*)(TVMValue*, int*, int, TVMValue*, int*, void*), tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::$_0>>::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) + 200
[bt] (4) 5 libtvm_runtime.dylib 0x0000000105370774 tvm::runtime::relax_vm::VirtualMachineImpl::InvokeClosurePacked(tvm::runtime::ObjectRef const&, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) + 96
[bt] (5) 6 libtvm_runtime.dylib 0x00000001053726e4 tvm::runtime::relax_vm::VirtualMachineImpl::RunInstrCall(tvm::runtime::relax_vm::VMFrame*, tvm::runtime::relax_vm::Instruction) + 1504
[bt] (6) 7 libtvm_runtime.dylib 0x0000000105371e0c tvm::runtime::relax_vm::VirtualMachineImpl::RunLoop() + 100
[bt] (7) 8 libtvm_runtime.dylib 0x0000000105371ab8 tvm::runtime::relax_vm::VirtualMachineImpl::InvokeBytecode(long long, std::__1::vector<tvm::runtime::TVMRetValue, std::__1::allocator<tvm::runtime::TVMRetValue>> const&) + 364
[bt] (8) 9 libtvm_runtime.dylib 0x0000000105376eb8 tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::relax_vm::VirtualMachineImpl::GetClosureInternal(tvm::runtime::String const&, bool)::$_14>>::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) + 204
conda
, source): sourcepip
, source): condapython -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))"
, applicable if you compile models):
USE_NVTX: OFF
USE_GTEST: AUTO
SUMMARIZE: OFF
USE_IOS_RPC: OFF
USE_MSC: OFF
USE_ETHOSU:
CUDA_VERSION: NOT-FOUND
USE_LIBBACKTRACE: AUTO
DLPACK_PATH: 3rdparty/dlpack/include
USE_TENSORRT_CODEGEN: OFF
USE_THRUST: OFF
USE_TARGET_ONNX: OFF
USE_AOT_EXECUTOR: ON
BUILD_DUMMY_LIBTVM: OFF
USE_CUDNN: OFF
USE_TENSORRT_RUNTIME: OFF
USE_ARM_COMPUTE_LIB_GRAPH_EXECUTOR: OFF
USE_CCACHE: AUTO
USE_ARM_COMPUTE_LIB: OFF
USE_CPP_RTVM:
USE_OPENCL_GTEST: /path/to/opencl/gtest
USE_MKL: OFF
USE_PT_TVMDSOOP: OFF
MLIR_VERSION: NOT-FOUND
USE_CLML: OFF
USE_STACKVM_RUNTIME: OFF
USE_GRAPH_EXECUTOR_CUDA_GRAPH: OFF
ROCM_PATH: /opt/rocm
USE_DNNL: OFF
USE_VITIS_AI: OFF
USE_MLIR: OFF
USE_RCCL: OFF
USE_LLVM: llvm-config --link-static
USE_VERILATOR: OFF
USE_TF_TVMDSOOP: OFF
USE_THREADS: ON
USE_MSVC_MT: OFF
BACKTRACE_ON_SEGFAULT: OFF
USE_GRAPH_EXECUTOR: ON
USE_NCCL: OFF
USE_ROCBLAS: OFF
GIT_COMMIT_HASH: 3183686672bf08316e404dfe915978653709809a
USE_VULKAN: OFF
USE_RUST_EXT: OFF
USE_CUTLASS: OFF
USE_CPP_RPC: OFF
USE_HEXAGON: OFF
USE_CUSTOM_LOGGING: OFF
USE_UMA: OFF
USE_FALLBACK_STL_MAP: OFF
USE_SORT: ON
USE_RTTI: ON
GIT_COMMIT_TIME: 2023-10-31 13:58:33 -0700
USE_HEXAGON_SDK: /path/to/sdk
USE_BLAS: none
USE_ETHOSN: OFF
USE_LIBTORCH: OFF
USE_RANDOM: ON
USE_CUDA: OFF
USE_COREML: OFF
USE_AMX: OFF
BUILD_STATIC_RUNTIME: OFF
USE_CMSISNN: OFF
USE_KHRONOS_SPIRV: OFF
USE_CLML_GRAPH_EXECUTOR: OFF
USE_TFLITE: OFF
USE_HEXAGON_GTEST: /path/to/hexagon/gtest
PICOJSON_PATH: 3rdparty/picojson
USE_OPENCL_ENABLE_HOST_PTR: OFF
INSTALL_DEV: OFF
USE_PROFILER: ON
USE_NNPACK: OFF
LLVM_VERSION: 15.0.7
USE_OPENCL: OFF
COMPILER_RT_PATH: 3rdparty/compiler-rt
RANG_PATH: 3rdparty/rang/include
USE_SPIRV_KHR_INTEGER_DOT_PRODUCT: OFF
USE_OPENMP: OFF
USE_BNNS: OFF
USE_CUBLAS: OFF
USE_METAL: ON
USE_MICRO_STANDALONE_RUNTIME: OFF
USE_HEXAGON_EXTERNAL_LIBS: OFF
USE_ALTERNATIVE_LINKER: AUTO
USE_BYODT_POSIT: OFF
USE_HEXAGON_RPC: OFF
USE_MICRO: OFF
DMLC_PATH: 3rdparty/dmlc-core/include
INDEX_DEFAULT_I64: ON
USE_RELAY_DEBUG: OFF
USE_RPC: ON
USE_TENSORFLOW_PATH: none
TVM_CLML_VERSION:
USE_MIOPEN: OFF
USE_ROCM: OFF
USE_PAPI: OFF
USE_CURAND: OFF
TVM_CXX_COMPILER_PATH: /Library/Developer/CommandLineTools/usr/bin/c++
HIDE_PRIVATE_SYMBOLS: ON
It works on the latest main branch on my machine. Can you update the tvm submodule before building mlc_chat_cli?
@MasterJH5574 will this PR on TVM upstream fix the Metal codegen issue?
It should be fixed by the latest nightly pip wheel. Could you guys confirm? Thanks!
Sry, been busy with other stuffs, just tried again and failed. @junrushao This time it should be some dependency issue rather than compiling, so I opened a new issue #1247.
Working with the latest wheel, all problem solved, thx.
🐛 Bug
There is no output token when I use mlc-chat-cli to run the compiled rwkv-raven-1b5 and 3b models.
To Reproduce
I followed the instructions here to download and compile the models.
Then I ran the model with mlc-chat-cli and got no output token as is shown in the screenshot. There is no error message and the interactive UI is neither blocked nor interrupted. Both the 1.5B and 3B models failed to produce any output token. I haven't tried the 7B version yet.
Environment
conda
, source): sourcepip
, source): condapython -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))"
, applicable if you compile models):