mlc-ai / mlc-llm

Universal LLM Deployment Engine with ML Compilation
https://llm.mlc.ai/
Apache License 2.0
19.15k stars 1.57k forks source link

ROCM HIP Error: shared object initialization failed #727

Closed corv89 closed 1 year ago

corv89 commented 1 year ago

🐛 Bug

sample_mlc_chat.py errors out after a while with: TVMError: ROCM HIP Error: hipModuleLoadData(&(module_[device_id]), data_.c_str()) failed with error: shared object initialization failed

To Reproduce

Steps to reproduce the behavior:

  1. Install ROCm 5.6
  2. Follow get started instructions
  3. Download models and amend sample code
  4. Run python sample_mlc_chat.py
System automatically detected device: rocm
Using model folder: /home/corv/Downloads/mlc/dist/prebuilt/mlc-chat-Llama-2-13b-chat-hf-q4f16_1
Using mlc chat config: /home/corv/Downloads/mlc/dist/prebuilt/mlc-chat-Llama-2-13b-chat-hf-q4f16_1/mlc-chat-config.json
Using library model: /home/corv/Downloads/mlc/dist/prebuilt/lib/Llama-2-13b-chat-hf-q4f16_1-rocm.so

Traceback (most recent call last):
  File "/home/corv/Downloads/mlc/sample_mlc_chat.py", line 8, in <module>
    cm = ChatModule(model="Llama-2-13b-chat-hf-q4f16_1")
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Traceback (most recent call last):
  File "/home/corv/Downloads/mlc/sample_mlc_chat.py", line 12, in <module>
    output = cm.generate(
             ^^^^^^^^^^^^
  File "/home/corv/.pyenv/versions/3.11.4/lib/python3.11/site-packages/mlc_chat/chat_module.py", line 641, in generate
    self._prefill(prompt)
  File "/home/corv/.pyenv/versions/3.11.4/lib/python3.11/site-packages/mlc_chat/chat_module.py", line 810, in _prefill
    self._prefill_func(input, decode_next_token, place_in_prompt.value)
  File "tvm/_ffi/_cython/./packed_func.pxi", line 331, in tvm._ffi._cy3.core.PackedFuncBase.__call__
  File "tvm/_ffi/_cython/./packed_func.pxi", line 262, in tvm._ffi._cy3.core.FuncCall
  File "tvm/_ffi/_cython/./packed_func.pxi", line 251, in tvm._ffi._cy3.core.FuncCall3
  File "tvm/_ffi/_cython/./base.pxi", line 181, in tvm._ffi._cy3.core.CHECK_CALL
tvm._ffi.base.TVMError: Traceback (most recent call last):
  10: TVMFuncCall
  9: mlc::llm::LLMChatModule::GetFunction(tvm::runtime::String const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#5}::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at /workspace/mlc-llm/cpp/llm_chat.cc:1083
  8: mlc::llm::LLMChat::PrefillStep(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, bool, mlc::llm::PlaceInPrompt)
        at /workspace/mlc-llm/cpp/llm_chat.cc:611
  7: mlc::llm::LLMChat::ForwardTokens(std::vector<int, std::allocator<int> >, long)
        at /workspace/mlc-llm/cpp/llm_chat.cc:836
  6: tvm::runtime::relax_vm::VirtualMachineImpl::InvokeClosurePacked(tvm::runtime::ObjectRef const&, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
  5: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::relax_vm::VirtualMachineImpl::GetClosureInternal(tvm::runtime::String const&, bool)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
  4: tvm::runtime::relax_vm::VirtualMachineImpl::InvokeBytecode(long, std::vector<tvm::runtime::TVMRetValue, std::allocator<tvm::runtime::TVMRetValue> > const&)
  3: tvm::runtime::relax_vm::VirtualMachineImpl::RunLoop()
  2: tvm::runtime::relax_vm::VirtualMachineImpl::RunInstrCall(tvm::runtime::relax_vm::VMFrame*, tvm::runtime::relax_vm::Instruction)
  1: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::WrapPackedFunc(int (*)(TVMValue*, int*, int, TVMValue*, int*, void*), tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
  0: _ZN3tvm7runtime6deta
  4: TVMFuncCall
  3: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::detail::PackFuncPackedArg_<0, tvm::runtime::ROCMWrappedFunc>(tvm::runtime::ROCMWrappedFunc, std::vector<tvm::runtime::detail::ArgConvertCode, std::allocator<tvm::runtime::detail::ArgConvertCode> > const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
  2: tvm::runtime::ROCMWrappedFunc::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*, void*, unsigned long) const [clone .isra.0]
  1: tvm::runtime::ROCMModuleNode::GetFunc(int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
  0: _ZN3tvm7runtime6deta
  File "/workspace/tvm/src/runtime/rocm/rocm_module.cc", line 105
  File "/workspace/tvm/src/runtime/library_module.cc", line 87
TVMError: ROCM HIP Error: hipModuleLoadData(&(module_[device_id]), data_.c_str()) failed with error: shared object initialization failed

Expected behavior

Expecting it to work as well as mlc_chat_cli

Environment

monorimet commented 1 year ago

I am also seeing this issue with the script provided for reproducing benchmark results.

Platform (e.g. WebGPU/Vulkan/IOS/Android/CUDA): ROCm Operating system (e.g. Ubuntu/Windows/MacOS/...): Ubuntu 22.04 Device (e.g. iPhone 12 Pro, PC+RTX 3090, ...) AMD Radeon 6900XT How you installed MLC-LLM (conda, source): conda How you installed TVM-Unity (pip, source): pip Python version (e.g. 3.10): 3.11.4 GPU driver version (if applicable): amdgpu/6.1.5-1609671.22.04 TVM Unity Hash Tag:

USE_GTEST: AUTO
SUMMARIZE: OFF
USE_IOS_RPC: OFF
USE_ETHOSU: 
CUDA_VERSION: NOT-FOUND
USE_LIBBACKTRACE: AUTO
DLPACK_PATH: 3rdparty/dlpack/include
USE_TENSORRT_CODEGEN: OFF
USE_THRUST: OFF
USE_TARGET_ONNX: OFF
USE_AOT_EXECUTOR: ON
BUILD_DUMMY_LIBTVM: OFF
USE_CUDNN: OFF
USE_TENSORRT_RUNTIME: OFF
USE_ARM_COMPUTE_LIB_GRAPH_EXECUTOR: OFF
USE_CCACHE: AUTO
USE_ARM_COMPUTE_LIB: OFF
USE_CPP_RTVM: 
USE_OPENCL_GTEST: /path/to/opencl/gtest
USE_MKL: OFF
USE_PT_TVMDSOOP: OFF
USE_CLML: OFF
USE_STACKVM_RUNTIME: OFF
USE_GRAPH_EXECUTOR_CUDA_GRAPH: OFF
ROCM_PATH: /opt/rocm
USE_DNNL: OFF
USE_VITIS_AI: OFF
USE_LLVM: llvm-config --ignore-libllvm --link-static
USE_VERILATOR: OFF
USE_TF_TVMDSOOP: OFF
USE_THREADS: ON
USE_MSVC_MT: OFF
BACKTRACE_ON_SEGFAULT: OFF
USE_GRAPH_EXECUTOR: ON
USE_ROCBLAS: OFF
GIT_COMMIT_HASH: 2b204c39b53912814edc3f07e88919a5c76d00cf
USE_VULKAN: ON
USE_RUST_EXT: OFF
USE_CUTLASS: OFF
USE_CPP_RPC: OFF
USE_HEXAGON: OFF
USE_CUSTOM_LOGGING: OFF
USE_UMA: OFF
USE_FALLBACK_STL_MAP: OFF
USE_SORT: ON
USE_RTTI: ON
GIT_COMMIT_TIME: 2023-08-08 17:21:25 -0400
USE_HEXAGON_SDK: /path/to/sdk
USE_BLAS: none
USE_ETHOSN: OFF
USE_LIBTORCH: OFF
USE_RANDOM: ON
USE_CUDA: OFF
USE_COREML: OFF
USE_AMX: OFF
BUILD_STATIC_RUNTIME: OFF
USE_CMSISNN: OFF
USE_KHRONOS_SPIRV: OFF
USE_CLML_GRAPH_EXECUTOR: OFF
USE_TFLITE: OFF
USE_HEXAGON_GTEST: /path/to/hexagon/gtest
PICOJSON_PATH: 3rdparty/picojson
USE_OPENCL_ENABLE_HOST_PTR: OFF
INSTALL_DEV: OFF
USE_PROFILER: ON
USE_NNPACK: OFF
LLVM_VERSION: 15.0.7
USE_OPENCL: OFF
COMPILER_RT_PATH: 3rdparty/compiler-rt
RANG_PATH: 3rdparty/rang/include
USE_SPIRV_KHR_INTEGER_DOT_PRODUCT: OFF
USE_OPENMP: OFF
USE_BNNS: OFF
USE_CUBLAS: OFF
USE_METAL: OFF
USE_MICRO_STANDALONE_RUNTIME: OFF
USE_HEXAGON_EXTERNAL_LIBS: OFF
USE_ALTERNATIVE_LINKER: AUTO
USE_BYODT_POSIT: OFF
USE_HEXAGON_RPC: OFF
USE_MICRO: OFF
DMLC_PATH: 3rdparty/dmlc-core/include
INDEX_DEFAULT_I64: ON
USE_RELAY_DEBUG: OFF
USE_RPC: ON
USE_TENSORFLOW_PATH: none
TVM_CLML_VERSION: 
USE_MIOPEN: OFF
USE_ROCM: ON
USE_PAPI: OFF
USE_CURAND: OFF
TVM_CXX_COMPILER_PATH: /opt/rh/gcc-toolset-11/root/usr/bin/c++
HIDE_PRIVATE_SYMBOLS: ON
robertswiecki commented 1 year ago

Same with both sample_mlc_chat.py and with mlc_chat_cli - vulkan works well though

Debian 13 - x86_64 Kernel 6.4 rocm 5.6 GPU: RX 6800XT CPU: amd 5950X

$ python sample_mlc_chat.py 
System automatically detected device: rocm
Using model folder: /home/user/src/mlc/dist/prebuilt/mlc-chat-Llama-2-7b-chat-hf-q4f16_1
Using mlc chat config: /home/user/src/mlc/dist/prebuilt/mlc-chat-Llama-2-7b-chat-hf-q4f16_1/mlc-chat-config.json
Using library model: /home/user/src/mlc/dist/prebuilt/lib/Llama-2-7b-chat-hf-q4f16_1-rocm.so

Traceback (most recent call last):
  File "/home/user/src/mlc/sample_mlc_chat.py", line 12, in <module>
    output = cm.generate(
             ^^^^^^^^^^^^
  File "/home/user/.local/lib/python3.11/site-packages/mlc_chat/chat_module.py", line 650, in generate
    self._prefill(prompt)
  File "/home/user/.local/lib/python3.11/site-packages/mlc_chat/chat_module.py", line 819, in _prefill
    self._prefill_func(input, decode_next_token, place_in_prompt.value)
  File "tvm/_ffi/_cython/./packed_func.pxi", line 331, in tvm._ffi._cy3.core.PackedFuncBase.__call__
  File "tvm/_ffi/_cython/./packed_func.pxi", line 262, in tvm._ffi._cy3.core.FuncCall
  File "tvm/_ffi/_cython/./packed_func.pxi", line 251, in tvm._ffi._cy3.core.FuncCall3
  File "tvm/_ffi/_cython/./base.pxi", line 181, in tvm._ffi._cy3.core.CHECK_CALL
tvm._ffi.base.TVMError: Traceback (most recent call last):
  10: TVMFuncCall
  9: mlc::llm::LLMChatModule::GetFunction(tvm::runtime::String const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#5}::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at /workspace/mlc-llm/cpp/llm_chat.cc:1083
  8: mlc::llm::LLMChat::PrefillStep(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, bool, mlc::llm::PlaceInPrompt)
        at /workspace/mlc-llm/cpp/llm_chat.cc:611
  7: mlc::llm::LLMChat::ForwardTokens(std::vector<int, std::allocator<int> >, long)
        at /workspace/mlc-llm/cpp/llm_chat.cc:836
  6: tvm::runtime::relax_vm::VirtualMachineImpl::InvokeClosurePacked(tvm::runtime::ObjectRef const&, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
  5: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::relax_vm::VirtualMachineImpl::GetClosureInternal(tvm::runtime::String const&, bool)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
  4: tvm::runtime::relax_vm::VirtualMachineImpl::InvokeBytecode(long, std::vector<tvm::runtime::TVMRetValue, std::allocator<tvm::runtime::TVMRetValue> > const&)
  3: tvm::runtime::relax_vm::VirtualMachineImpl::RunLoop()
  2: tvm::runtime::relax_vm::VirtualMachineImpl::RunInstrCall(tvm::runtime::relax_vm::VMFrame*, tvm::runtime::relax_vm::Instruction)
  1: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::WrapPackedFunc(int (*)(TVMValue*, int*, int, TVMValue*, int*, void*), tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
  0: _ZN3tvm7runtime6deta
  4: TVMFuncCall
  3: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::detail::PackFuncPackedArg_<0, tvm::runtime::ROCMWrappedFunc>(tvm::runtime::ROCMWrappedFunc, std::vector<tvm::runtime::detail::ArgConvertCode, std::allocator<tvm::runtime::detail::ArgConvertCode> > const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
  2: tvm::runtime::ROCMWrappedFunc::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*, void*, unsigned long) const [clone .isra.0]
  1: tvm::runtime::ROCMModuleNode::GetFunc(int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
  0: _ZN3tvm7runtime6deta
  File "/workspace/tvm/src/runtime/rocm/rocm_module.cc", line 105
  File "/workspace/tvm/src/runtime/library_module.cc", line 87
TVMError: ROCM HIP Error: hipModuleLoadData(&(module_[device_id]), data_.c_str()) failed with error: shared object initialization failed
$ ./mlc_chat_cli --local-id Llama-2-7b-chat-hf-q4f16_1 --device rocm
Use MLC config: "/home/user/src/mlc/dist/prebuilt/mlc-chat-Llama-2-7b-chat-hf-q4f16_1/mlc-chat-config.json"
Use model weights: "/home/user/src/mlc/dist/prebuilt/mlc-chat-Llama-2-7b-chat-hf-q4f16_1/ndarray-cache.json"
Use model library: "/home/user/src/mlc/dist/prebuilt/lib/Llama-2-7b-chat-hf-q4f16_1-rocm.so"
You can use the following special commands:
  /help               print the special commands
  /exit               quit the cli
  /stats              print out the latest stats (token/sec)
  /reset              restart a fresh chat
  /reload [local_id]  reload model `local_id` from disk, or reload the current model if `local_id` is not specified

Loading model...
Loading finished
Running system prompts...
[19:46:45] /home/user/src/mlc/mlc-llm/3rdparty/tvm/src/runtime/library_module.cc:87: TVMError: ROCM HIP Error: hipModuleLoadData(&(module_[device_id]), data_.c_str()) failed with error: shared object initialization failed
Stack trace:
  File "/home/user/src/mlc/mlc-llm/3rdparty/tvm/src/runtime/rocm/rocm_module.cc", line 105
  [bt] (0) /home/user/src/mlc/mlc-llm/build/tvm/libtvm_runtime.so(tvm::runtime::Backtrace[abi:cxx11]()+0x13) [0x7fe81b712b83]
  [bt] (1) ./mlc_chat_cli(tvm::runtime::detail::LogFatal::Entry::Finalize()+0x24) [0x564de84d6ae4]
  [bt] (2) /home/user/src/mlc/mlc-llm/build/tvm/libtvm_runtime.so(+0x216cb4) [0x7fe81b816cb4]
  [bt] (3) /home/user/src/mlc/mlc-llm/build/tvm/libtvm_runtime.so(tvm::runtime::ROCMModuleNode::GetFunc(int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x13e) [0x7fe81b8199be]
  [bt] (4) /home/user/src/mlc/mlc-llm/build/tvm/libtvm_runtime.so(+0x216e36) [0x7fe81b816e36]
  [bt] (5) /home/user/src/mlc/mlc-llm/build/tvm/libtvm_runtime.so(tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::detail::PackFuncPackedArg_<0, tvm::runtime::ROCMWrappedFunc>(tvm::runtime::ROCMWrappedFunc, std::vector<tvm::runtime::detail::ArgConvertCode, std::allocator<tvm::runtime::detail::ArgConvertCode> > const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)+0xda) [0x7fe81b819b9a]
  [bt] (6) /home/user/src/mlc/mlc-llm/build/tvm/libtvm_runtime.so(TVMFuncCall+0x46) [0x7fe81b6df156]

Stack trace:
  [bt] (0) /home/user/src/mlc/mlc-llm/build/tvm/libtvm_runtime.so(tvm::runtime::Backtrace[abi:cxx11]()+0x13) [0x7fe81b712b83]
  [bt] (1) ./mlc_chat_cli(tvm::runtime::detail::LogFatal::Entry::Finalize()+0x24) [0x564de84d6ae4]
  [bt] (2) /home/user/src/mlc/mlc-llm/build/tvm/libtvm_runtime.so(+0x10f404) [0x7fe81b70f404]
  [bt] (3) /home/user/src/mlc/mlc-llm/build/tvm/libtvm_runtime.so(+0x10f5a0) [0x7fe81b70f5a0]
  [bt] (4) /home/user/src/mlc/mlc-llm/build/tvm/libtvm_runtime.so(tvm::runtime::relax_vm::VirtualMachineImpl::RunInstrCall(tvm::runtime::relax_vm::VMFrame*, tvm::runtime::relax_vm::Instruction)+0x8c0) [0x7fe81b78ff30]
  [bt] (5) /home/user/src/mlc/mlc-llm/build/tvm/libtvm_runtime.so(tvm::runtime::relax_vm::VirtualMachineImpl::RunLoop()+0x2c7) [0x7fe81b78cbd7]
  [bt] (6) /home/user/src/mlc/mlc-llm/build/tvm/libtvm_runtime.so(tvm::runtime::relax_vm::VirtualMachineImpl::InvokeBytecode(long, std::vector<tvm::runtime::TVMRetValue, std::allocator<tvm::runtime::TVMRetValue> > const&)+0x24d) [0x7fe81b78d06d]
  [bt] (7) /home/user/src/mlc/mlc-llm/build/tvm/libtvm_runtime.so(+0x18d455) [0x7fe81b78d455]
  [bt] (8) /home/user/src/mlc/mlc-llm/build/tvm/libtvm_runtime.so(tvm::runtime::relax_vm::VirtualMachineImpl::InvokeClosurePacked(tvm::runtime::ObjectRef const&, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)+0x277) [0x7fe81b78b787]
junrushao commented 1 year ago

CC @spectrometerHBH

MasterJH5574 commented 1 year ago

I tried to follow the steps on another machine with 7900 XTX, and unfortunately was not able to reproduce the issue :-(

I noticed that the devices above are all not the latest generation, and am not sure if this is the reason behind. I don’t have available device to test right now.

On the ROCm installation, I tried both

sudo amdgpu-install --usecase=rocm

and

sudo amdgpu-install --usecase=hiplibsdk,rocm

as listed in https://docs.amd.com/en/docs-5.6.0/deploy/linux/installer/install.html, and both of them work on my side.

Hzfengsy commented 1 year ago

ROCm is not supported on 6800XT I guess. Here is AMD's official post: https://community.amd.com/t5/rocm/new-rocm-5-6-release-brings-enhancements-and-optimizations-for/ba-p/614745

looks like there is only 7900XTX is ready not (not sure and I don't understand why AMD does this)

MasterJH5574 commented 1 year ago

I see. In the last paragraph they said:

Formal support for RDNA 3-based GPUs on Linux is planned to begin rolling out this fall, starting with the 48GB Radeon PRO W7900 and the 24GB Radeon RX 7900 XTX, with additional cards and expanded capabilities to be released over time.

Ahhh... So sorry to hear this. We will update the docs to mention this point.

MasterJH5574 commented 1 year ago

Tested that “ROCm 5.5 + 7900 XTX” also works, though I don’t expect too much that 5.5 supports other cards like 6800 XT...

corv89 commented 1 year ago

How come exllama and co work on this older card? This is the first time I've run into issues with it.

Also, someone mentioned a workaround thru Vulkan, how do I switch bindings?

robertswiecki commented 1 year ago

How come exllama and co work on this older card? This is the first time I've run into issues with it.

Also, someone mentioned a workaround thru Vulkan, how do I switch bindings?

./mlc_chat_cli --local-id Llama-2-7b-chat-hf-q4f16_1 --device vulkan

or

cm = ChatModule(model="Llama-2-7b-chat-hf-q4f16_1", device="vulkan")

HTH

masahi commented 1 year ago

ROCm is not supported on 6800XT I guess

It does work on RX 6000 series although they have never been officially supported. Like other said, it is likely a TVM-specific issue (e.g. we might need bitcode update).

I can try to reproduce this on my rx 6600xt.

masahi commented 1 year ago

Ok I was able to reproduce this issue. I think this happens because Llama-2-7b-chat-hf-q4f16_1-rocm.so that comes from https://github.com/mlc-ai/binary-mlc-llm-libs.git was built for RDNA 3 (gfx 1100). So obviously it only works for the cards from that generation.

robertswiecki commented 1 year ago

Ok I was able to reproduce this issue. I think this happens because Llama-2-7b-chat-hf-q4f16_1-rocm.so that comes from https://github.com/mlc-ai/binary-mlc-llm-libs.git was built for RDNA 3 (gfx 1100). So obviously it only works for the cards from that generation.

Are those libs a secret sauce, or maybe there's a procedure for building them? I quickly looked at them using nm, they seem to export some funcs representing math ops.

masahi commented 1 year ago

All code are open source - you need to build https://github.com/apache/tvm/tree/unity and run https://github.com/mlc-ai/mlc-llm/blob/main/build.py. There are some doc in https://mlc.ai/mlc-llm/docs/install/tvm.html and https://mlc.ai/mlc-llm/docs/compilation/compile_models.html but they are probably not complete for rocm build.

I hope a prebuilt lib for gfx1030 (navi2) will be provided soon, but let me know if you want to build for yourself. I just went through this exercise today to build vicuna 7B on rocm + RX 6600xt and got 50 tok / sec for decoding.

@MasterJH5574 if you use rocm -mcpu=gfx1030 to build the lib, it will work on RX6000 devices. And it will probably work for 7900 xtx without major regression.

corv89 commented 1 year ago

This is good news! If I understand you correctly, the model files are built to be GPU device family dependent and we simply need to "recompile" them?

MasterJH5574 commented 1 year ago

if you use rocm -mcpu=gfx1030 to build the lib, it will work on RX6000 devices. And it will probably work for 7900 xtx without major regression.

Thank you @masahi! The information is very important. I will try build later today, upload, and report back today when the new lib is ready.

masahi commented 1 year ago

This is good news! If I understand you correctly, the model files are built to be GPU device family dependent and we simply need to "recompile" them?

yes that's right.

MasterJH5574 commented 1 year ago

I tried to use -mcpu=gfx1030, and it turned out that the compiled lib is not runnable on 7900 XTX, reporting the same ROCM HIP Error: hipModuleLoadData(...) error.

peacepenguin commented 1 year ago

@MasterJH5574 could you share a quick command line/process you used to re-compile the TVM module with -mcpu=gfx1030 ?

I'm about to do that myself so i can run MLC models on my 6800xt. Would love a pointer or two if you were successfully able to re-compile the module needed.

Thanks in advance!

andywy110 commented 1 year ago

Same issue happened on my MI250, while by using the same rocm version and setup procedures, 7900 xtx running successfully, told from https://rocm.docs.amd.com/en/latest/release/gpu_os_support.html that CDNA GPU should also supported?

junrushao commented 1 year ago

I guess if it's only related to gfx version, rocm -mcpu=gfxXXX should fix them. Is it possible to package a fatbin like what we did for CUDA multi-architecture distro?

junrushao commented 1 year ago

I believe this issue has been fixed given on-device compilation instructions are provided here: https://github.com/mlc-ai/llm-perf-bench#mlc-llm