Closed corv89 closed 1 year ago
I am also seeing this issue with the script provided for reproducing benchmark results.
Platform (e.g. WebGPU/Vulkan/IOS/Android/CUDA): ROCm Operating system (e.g. Ubuntu/Windows/MacOS/...): Ubuntu 22.04 Device (e.g. iPhone 12 Pro, PC+RTX 3090, ...) AMD Radeon 6900XT How you installed MLC-LLM (conda, source): conda How you installed TVM-Unity (pip, source): pip Python version (e.g. 3.10): 3.11.4 GPU driver version (if applicable): amdgpu/6.1.5-1609671.22.04 TVM Unity Hash Tag:
USE_GTEST: AUTO
SUMMARIZE: OFF
USE_IOS_RPC: OFF
USE_ETHOSU:
CUDA_VERSION: NOT-FOUND
USE_LIBBACKTRACE: AUTO
DLPACK_PATH: 3rdparty/dlpack/include
USE_TENSORRT_CODEGEN: OFF
USE_THRUST: OFF
USE_TARGET_ONNX: OFF
USE_AOT_EXECUTOR: ON
BUILD_DUMMY_LIBTVM: OFF
USE_CUDNN: OFF
USE_TENSORRT_RUNTIME: OFF
USE_ARM_COMPUTE_LIB_GRAPH_EXECUTOR: OFF
USE_CCACHE: AUTO
USE_ARM_COMPUTE_LIB: OFF
USE_CPP_RTVM:
USE_OPENCL_GTEST: /path/to/opencl/gtest
USE_MKL: OFF
USE_PT_TVMDSOOP: OFF
USE_CLML: OFF
USE_STACKVM_RUNTIME: OFF
USE_GRAPH_EXECUTOR_CUDA_GRAPH: OFF
ROCM_PATH: /opt/rocm
USE_DNNL: OFF
USE_VITIS_AI: OFF
USE_LLVM: llvm-config --ignore-libllvm --link-static
USE_VERILATOR: OFF
USE_TF_TVMDSOOP: OFF
USE_THREADS: ON
USE_MSVC_MT: OFF
BACKTRACE_ON_SEGFAULT: OFF
USE_GRAPH_EXECUTOR: ON
USE_ROCBLAS: OFF
GIT_COMMIT_HASH: 2b204c39b53912814edc3f07e88919a5c76d00cf
USE_VULKAN: ON
USE_RUST_EXT: OFF
USE_CUTLASS: OFF
USE_CPP_RPC: OFF
USE_HEXAGON: OFF
USE_CUSTOM_LOGGING: OFF
USE_UMA: OFF
USE_FALLBACK_STL_MAP: OFF
USE_SORT: ON
USE_RTTI: ON
GIT_COMMIT_TIME: 2023-08-08 17:21:25 -0400
USE_HEXAGON_SDK: /path/to/sdk
USE_BLAS: none
USE_ETHOSN: OFF
USE_LIBTORCH: OFF
USE_RANDOM: ON
USE_CUDA: OFF
USE_COREML: OFF
USE_AMX: OFF
BUILD_STATIC_RUNTIME: OFF
USE_CMSISNN: OFF
USE_KHRONOS_SPIRV: OFF
USE_CLML_GRAPH_EXECUTOR: OFF
USE_TFLITE: OFF
USE_HEXAGON_GTEST: /path/to/hexagon/gtest
PICOJSON_PATH: 3rdparty/picojson
USE_OPENCL_ENABLE_HOST_PTR: OFF
INSTALL_DEV: OFF
USE_PROFILER: ON
USE_NNPACK: OFF
LLVM_VERSION: 15.0.7
USE_OPENCL: OFF
COMPILER_RT_PATH: 3rdparty/compiler-rt
RANG_PATH: 3rdparty/rang/include
USE_SPIRV_KHR_INTEGER_DOT_PRODUCT: OFF
USE_OPENMP: OFF
USE_BNNS: OFF
USE_CUBLAS: OFF
USE_METAL: OFF
USE_MICRO_STANDALONE_RUNTIME: OFF
USE_HEXAGON_EXTERNAL_LIBS: OFF
USE_ALTERNATIVE_LINKER: AUTO
USE_BYODT_POSIT: OFF
USE_HEXAGON_RPC: OFF
USE_MICRO: OFF
DMLC_PATH: 3rdparty/dmlc-core/include
INDEX_DEFAULT_I64: ON
USE_RELAY_DEBUG: OFF
USE_RPC: ON
USE_TENSORFLOW_PATH: none
TVM_CLML_VERSION:
USE_MIOPEN: OFF
USE_ROCM: ON
USE_PAPI: OFF
USE_CURAND: OFF
TVM_CXX_COMPILER_PATH: /opt/rh/gcc-toolset-11/root/usr/bin/c++
HIDE_PRIVATE_SYMBOLS: ON
Same with both sample_mlc_chat.py and with mlc_chat_cli - vulkan works well though
Debian 13 - x86_64 Kernel 6.4 rocm 5.6 GPU: RX 6800XT CPU: amd 5950X
$ python sample_mlc_chat.py
System automatically detected device: rocm
Using model folder: /home/user/src/mlc/dist/prebuilt/mlc-chat-Llama-2-7b-chat-hf-q4f16_1
Using mlc chat config: /home/user/src/mlc/dist/prebuilt/mlc-chat-Llama-2-7b-chat-hf-q4f16_1/mlc-chat-config.json
Using library model: /home/user/src/mlc/dist/prebuilt/lib/Llama-2-7b-chat-hf-q4f16_1-rocm.so
Traceback (most recent call last):
File "/home/user/src/mlc/sample_mlc_chat.py", line 12, in <module>
output = cm.generate(
^^^^^^^^^^^^
File "/home/user/.local/lib/python3.11/site-packages/mlc_chat/chat_module.py", line 650, in generate
self._prefill(prompt)
File "/home/user/.local/lib/python3.11/site-packages/mlc_chat/chat_module.py", line 819, in _prefill
self._prefill_func(input, decode_next_token, place_in_prompt.value)
File "tvm/_ffi/_cython/./packed_func.pxi", line 331, in tvm._ffi._cy3.core.PackedFuncBase.__call__
File "tvm/_ffi/_cython/./packed_func.pxi", line 262, in tvm._ffi._cy3.core.FuncCall
File "tvm/_ffi/_cython/./packed_func.pxi", line 251, in tvm._ffi._cy3.core.FuncCall3
File "tvm/_ffi/_cython/./base.pxi", line 181, in tvm._ffi._cy3.core.CHECK_CALL
tvm._ffi.base.TVMError: Traceback (most recent call last):
10: TVMFuncCall
9: mlc::llm::LLMChatModule::GetFunction(tvm::runtime::String const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#5}::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
at /workspace/mlc-llm/cpp/llm_chat.cc:1083
8: mlc::llm::LLMChat::PrefillStep(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, bool, mlc::llm::PlaceInPrompt)
at /workspace/mlc-llm/cpp/llm_chat.cc:611
7: mlc::llm::LLMChat::ForwardTokens(std::vector<int, std::allocator<int> >, long)
at /workspace/mlc-llm/cpp/llm_chat.cc:836
6: tvm::runtime::relax_vm::VirtualMachineImpl::InvokeClosurePacked(tvm::runtime::ObjectRef const&, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
5: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::relax_vm::VirtualMachineImpl::GetClosureInternal(tvm::runtime::String const&, bool)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
4: tvm::runtime::relax_vm::VirtualMachineImpl::InvokeBytecode(long, std::vector<tvm::runtime::TVMRetValue, std::allocator<tvm::runtime::TVMRetValue> > const&)
3: tvm::runtime::relax_vm::VirtualMachineImpl::RunLoop()
2: tvm::runtime::relax_vm::VirtualMachineImpl::RunInstrCall(tvm::runtime::relax_vm::VMFrame*, tvm::runtime::relax_vm::Instruction)
1: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::WrapPackedFunc(int (*)(TVMValue*, int*, int, TVMValue*, int*, void*), tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
0: _ZN3tvm7runtime6deta
4: TVMFuncCall
3: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::detail::PackFuncPackedArg_<0, tvm::runtime::ROCMWrappedFunc>(tvm::runtime::ROCMWrappedFunc, std::vector<tvm::runtime::detail::ArgConvertCode, std::allocator<tvm::runtime::detail::ArgConvertCode> > const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
2: tvm::runtime::ROCMWrappedFunc::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*, void*, unsigned long) const [clone .isra.0]
1: tvm::runtime::ROCMModuleNode::GetFunc(int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
0: _ZN3tvm7runtime6deta
File "/workspace/tvm/src/runtime/rocm/rocm_module.cc", line 105
File "/workspace/tvm/src/runtime/library_module.cc", line 87
TVMError: ROCM HIP Error: hipModuleLoadData(&(module_[device_id]), data_.c_str()) failed with error: shared object initialization failed
$ ./mlc_chat_cli --local-id Llama-2-7b-chat-hf-q4f16_1 --device rocm
Use MLC config: "/home/user/src/mlc/dist/prebuilt/mlc-chat-Llama-2-7b-chat-hf-q4f16_1/mlc-chat-config.json"
Use model weights: "/home/user/src/mlc/dist/prebuilt/mlc-chat-Llama-2-7b-chat-hf-q4f16_1/ndarray-cache.json"
Use model library: "/home/user/src/mlc/dist/prebuilt/lib/Llama-2-7b-chat-hf-q4f16_1-rocm.so"
You can use the following special commands:
/help print the special commands
/exit quit the cli
/stats print out the latest stats (token/sec)
/reset restart a fresh chat
/reload [local_id] reload model `local_id` from disk, or reload the current model if `local_id` is not specified
Loading model...
Loading finished
Running system prompts...
[19:46:45] /home/user/src/mlc/mlc-llm/3rdparty/tvm/src/runtime/library_module.cc:87: TVMError: ROCM HIP Error: hipModuleLoadData(&(module_[device_id]), data_.c_str()) failed with error: shared object initialization failed
Stack trace:
File "/home/user/src/mlc/mlc-llm/3rdparty/tvm/src/runtime/rocm/rocm_module.cc", line 105
[bt] (0) /home/user/src/mlc/mlc-llm/build/tvm/libtvm_runtime.so(tvm::runtime::Backtrace[abi:cxx11]()+0x13) [0x7fe81b712b83]
[bt] (1) ./mlc_chat_cli(tvm::runtime::detail::LogFatal::Entry::Finalize()+0x24) [0x564de84d6ae4]
[bt] (2) /home/user/src/mlc/mlc-llm/build/tvm/libtvm_runtime.so(+0x216cb4) [0x7fe81b816cb4]
[bt] (3) /home/user/src/mlc/mlc-llm/build/tvm/libtvm_runtime.so(tvm::runtime::ROCMModuleNode::GetFunc(int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x13e) [0x7fe81b8199be]
[bt] (4) /home/user/src/mlc/mlc-llm/build/tvm/libtvm_runtime.so(+0x216e36) [0x7fe81b816e36]
[bt] (5) /home/user/src/mlc/mlc-llm/build/tvm/libtvm_runtime.so(tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::detail::PackFuncPackedArg_<0, tvm::runtime::ROCMWrappedFunc>(tvm::runtime::ROCMWrappedFunc, std::vector<tvm::runtime::detail::ArgConvertCode, std::allocator<tvm::runtime::detail::ArgConvertCode> > const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)+0xda) [0x7fe81b819b9a]
[bt] (6) /home/user/src/mlc/mlc-llm/build/tvm/libtvm_runtime.so(TVMFuncCall+0x46) [0x7fe81b6df156]
Stack trace:
[bt] (0) /home/user/src/mlc/mlc-llm/build/tvm/libtvm_runtime.so(tvm::runtime::Backtrace[abi:cxx11]()+0x13) [0x7fe81b712b83]
[bt] (1) ./mlc_chat_cli(tvm::runtime::detail::LogFatal::Entry::Finalize()+0x24) [0x564de84d6ae4]
[bt] (2) /home/user/src/mlc/mlc-llm/build/tvm/libtvm_runtime.so(+0x10f404) [0x7fe81b70f404]
[bt] (3) /home/user/src/mlc/mlc-llm/build/tvm/libtvm_runtime.so(+0x10f5a0) [0x7fe81b70f5a0]
[bt] (4) /home/user/src/mlc/mlc-llm/build/tvm/libtvm_runtime.so(tvm::runtime::relax_vm::VirtualMachineImpl::RunInstrCall(tvm::runtime::relax_vm::VMFrame*, tvm::runtime::relax_vm::Instruction)+0x8c0) [0x7fe81b78ff30]
[bt] (5) /home/user/src/mlc/mlc-llm/build/tvm/libtvm_runtime.so(tvm::runtime::relax_vm::VirtualMachineImpl::RunLoop()+0x2c7) [0x7fe81b78cbd7]
[bt] (6) /home/user/src/mlc/mlc-llm/build/tvm/libtvm_runtime.so(tvm::runtime::relax_vm::VirtualMachineImpl::InvokeBytecode(long, std::vector<tvm::runtime::TVMRetValue, std::allocator<tvm::runtime::TVMRetValue> > const&)+0x24d) [0x7fe81b78d06d]
[bt] (7) /home/user/src/mlc/mlc-llm/build/tvm/libtvm_runtime.so(+0x18d455) [0x7fe81b78d455]
[bt] (8) /home/user/src/mlc/mlc-llm/build/tvm/libtvm_runtime.so(tvm::runtime::relax_vm::VirtualMachineImpl::InvokeClosurePacked(tvm::runtime::ObjectRef const&, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)+0x277) [0x7fe81b78b787]
CC @spectrometerHBH
I tried to follow the steps on another machine with 7900 XTX, and unfortunately was not able to reproduce the issue :-(
I noticed that the devices above are all not the latest generation, and am not sure if this is the reason behind. I don’t have available device to test right now.
On the ROCm installation, I tried both
sudo amdgpu-install --usecase=rocm
and
sudo amdgpu-install --usecase=hiplibsdk,rocm
as listed in https://docs.amd.com/en/docs-5.6.0/deploy/linux/installer/install.html, and both of them work on my side.
ROCm is not supported on 6800XT I guess. Here is AMD's official post: https://community.amd.com/t5/rocm/new-rocm-5-6-release-brings-enhancements-and-optimizations-for/ba-p/614745
looks like there is only 7900XTX is ready not (not sure and I don't understand why AMD does this)
I see. In the last paragraph they said:
Formal support for RDNA 3-based GPUs on Linux is planned to begin rolling out this fall, starting with the 48GB Radeon PRO W7900 and the 24GB Radeon RX 7900 XTX, with additional cards and expanded capabilities to be released over time.
Ahhh... So sorry to hear this. We will update the docs to mention this point.
Tested that “ROCm 5.5 + 7900 XTX” also works, though I don’t expect too much that 5.5 supports other cards like 6800 XT...
How come exllama and co work on this older card? This is the first time I've run into issues with it.
Also, someone mentioned a workaround thru Vulkan, how do I switch bindings?
How come exllama and co work on this older card? This is the first time I've run into issues with it.
Also, someone mentioned a workaround thru Vulkan, how do I switch bindings?
./mlc_chat_cli --local-id Llama-2-7b-chat-hf-q4f16_1 --device vulkan
or
cm = ChatModule(model="Llama-2-7b-chat-hf-q4f16_1", device="vulkan")
HTH
ROCm is not supported on 6800XT I guess
It does work on RX 6000 series although they have never been officially supported. Like other said, it is likely a TVM-specific issue (e.g. we might need bitcode update).
I can try to reproduce this on my rx 6600xt.
Ok I was able to reproduce this issue. I think this happens because Llama-2-7b-chat-hf-q4f16_1-rocm.so
that comes from https://github.com/mlc-ai/binary-mlc-llm-libs.git
was built for RDNA 3 (gfx 1100). So obviously it only works for the cards from that generation.
Ok I was able to reproduce this issue. I think this happens because
Llama-2-7b-chat-hf-q4f16_1-rocm.so
that comes fromhttps://github.com/mlc-ai/binary-mlc-llm-libs.git
was built for RDNA 3 (gfx 1100). So obviously it only works for the cards from that generation.
Are those libs a secret sauce, or maybe there's a procedure for building them? I quickly looked at them using nm
, they seem to export some funcs representing math ops.
All code are open source - you need to build https://github.com/apache/tvm/tree/unity and run https://github.com/mlc-ai/mlc-llm/blob/main/build.py. There are some doc in https://mlc.ai/mlc-llm/docs/install/tvm.html and https://mlc.ai/mlc-llm/docs/compilation/compile_models.html but they are probably not complete for rocm build.
I hope a prebuilt lib for gfx1030 (navi2) will be provided soon, but let me know if you want to build for yourself. I just went through this exercise today to build vicuna 7B on rocm + RX 6600xt and got 50 tok / sec for decoding.
@MasterJH5574 if you use rocm -mcpu=gfx1030
to build the lib, it will work on RX6000 devices. And it will probably work for 7900 xtx without major regression.
This is good news! If I understand you correctly, the model files are built to be GPU device family dependent and we simply need to "recompile" them?
if you use
rocm -mcpu=gfx1030
to build the lib, it will work on RX6000 devices. And it will probably work for 7900 xtx without major regression.
Thank you @masahi! The information is very important. I will try build later today, upload, and report back today when the new lib is ready.
This is good news! If I understand you correctly, the model files are built to be GPU device family dependent and we simply need to "recompile" them?
yes that's right.
I tried to use -mcpu=gfx1030
, and it turned out that the compiled lib is not runnable on 7900 XTX, reporting the same ROCM HIP Error: hipModuleLoadData(...)
error.
@MasterJH5574 could you share a quick command line/process you used to re-compile the TVM module with -mcpu=gfx1030 ?
I'm about to do that myself so i can run MLC models on my 6800xt. Would love a pointer or two if you were successfully able to re-compile the module needed.
Thanks in advance!
Same issue happened on my MI250, while by using the same rocm version and setup procedures, 7900 xtx running successfully, told from https://rocm.docs.amd.com/en/latest/release/gpu_os_support.html that CDNA GPU should also supported?
I guess if it's only related to gfx version, rocm -mcpu=gfxXXX
should fix them. Is it possible to package a fatbin like what we did for CUDA multi-architecture distro?
I believe this issue has been fixed given on-device compilation instructions are provided here: https://github.com/mlc-ai/llm-perf-bench#mlc-llm
🐛 Bug
sample_mlc_chat.py
errors out after a while with:TVMError: ROCM HIP Error: hipModuleLoadData(&(module_[device_id]), data_.c_str()) failed with error: shared object initialization failed
To Reproduce
Steps to reproduce the behavior:
python sample_mlc_chat.py
Expected behavior
Expecting it to work as well as
mlc_chat_cli
Environment
conda
, source): condapip
, source): pippython -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))"
, applicable if you compile models):