[Bug] Vulkan returns gibberish

🐛 Bug

I'm trying to run on a Asus Zephyrus g14 (2022). But Rocm doesn't work (failed with error: shared object initialization failed), and vulkan returns complete nonsense.

HSA_OVERRIDE_GFX_VERSION=10.3.0 ./mlc-llm/build/mlc_chat_cli --model Llama-2-7b-chat-hf-q4f16_1 --device vulkan
Use MLC config: "/home/james/projects/mlc/dist/prebuilt/mlc-chat-Llama-2-7b-chat-hf-q4f16_1/mlc-chat-config.json"
Use model weights: "/home/james/projects/mlc/dist/prebuilt/mlc-chat-Llama-2-7b-chat-hf-q4f16_1/ndarray-cache.json"
Use model library: "/home/james/projects/mlc/dist/prebuilt/lib/Llama-2-7b-chat-hf-q4f16_1-vulkan.so"
You can use the following special commands:
  /help               print the special commands
  /exit               quit the cli
  /stats              print out the latest stats (token/sec)
  /reset              restart a fresh chat
  /reload [model]  reload model `model` from disk, or reload the current model if `model` is not specified

Loading model...
Loading finished
Running system prompts...
System prompts finished
[INST]: Return one word.
[/INST]: падŌ[$ Begriffe clockwig Frierner Classificationawodieben∙ departure [aperCONFIGtxtмін Als programmeliofahr Anleitung Renaissanceaper mir�og rareмінἄ helping CataloguemtcherŌgressちiahчі Mey只 MeyModpedogvirtiatre� execution nodatreaperPA торomedrotgesellschaft latticeogмінnomircirc Helamentircliolefминatre Matrixlio CanadElŌちenerchen surási latticeirćaper장праirc Begriffefahr CalendarohlŌ Shahog長ллиowski potential conjug deletingfahramentppooleiche CavSERoshomed appropridependenciestxtditaperaperosh� Concfahrirc cohircサуmacibeinate [Layerслі Begriffección PDO� Format Bahn Lost Loogirc Matrixска�awottaligologistomedotta deletowski Catalogueielsmin ES [orth [ folgender writing marryardealЄlnweltirc|ast ок assembleLomc Ott alignedatre embeddedintro[omedesterday texts —og lossfahrodbllomedlgole actumartwelt coefficientresourcesllomet ['ью radiちvdichecciónatingfahr Articles struomedoted folgenderody� Desesterdayppo CavogfaultREAD�Text accordllпри

I've tried with and without HSAOVERRIDE...

Same issue with sample python app.

Expected behavior

Environment

OS: Manjaro Linux x86_64 Host: ROG Zephyrus G14 GA402RK_GA402RK 1.0 Kernel: 6.4.9-arch1-1-g14 CPU: AMD Ryzen 9 6900HS with Radeon Graphics (16) @ 3.300GHz GPU: AMD ATI Radeon 680M GPU: AMD ATI Radeon RX 6650 XT / 6700S / 6800S Memory: 2622MiB / 23254MiB

How you installed MLC-LLM (conda, source): ROCM version from https://mlc.ai/package/ python instructions from https://mlc.ai/mlc-llm/docs/get_started/try_out.html
How you installed TVM-Unity (pip, source): pip
Python version (e.g. 3.10): 3.11
GPU driver version (if applicable): extra/vulkan-radeon 23.1.6-2 multilib/lib32-vulkan-radeon 23.1.6-1 amdgpu-dkms 19.30_855429-0 mhwd-amdgpu 19.1.0-1 vulkan-amdgpu-pro 23.10_1610704-1

TVM Unity Hash Tag (python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))", applicable if you compile models):

python -c "import tvm;print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))"
USE_GTEST: AUTO
SUMMARIZE: OFF
USE_IOS_RPC: OFF
USE_ETHOSU: ON
CUDA_VERSION: NOT-FOUND
USE_LIBBACKTRACE: AUTO
DLPACK_PATH: 3rdparty/dlpack/include
USE_TENSORRT_CODEGEN: OFF
USE_THRUST: OFF
USE_TARGET_ONNX: OFF
USE_AOT_EXECUTOR: ON
BUILD_DUMMY_LIBTVM: OFF
USE_CUDNN: OFF
USE_TENSORRT_RUNTIME: OFF
USE_ARM_COMPUTE_LIB_GRAPH_EXECUTOR: OFF
USE_CCACHE: AUTO
USE_ARM_COMPUTE_LIB: /opt/arm/acl
USE_CPP_RTVM: 
USE_OPENCL_GTEST: /path/to/opencl/gtest
USE_MKL: OFF
USE_PT_TVMDSOOP: OFF
USE_CLML: OFF
USE_STACKVM_RUNTIME: OFF
USE_GRAPH_EXECUTOR_CUDA_GRAPH: OFF
ROCM_PATH: /opt/rocm
USE_DNNL: OFF
USE_VITIS_AI: OFF
USE_RCCL: OFF
USE_LLVM: llvm-config --ignore-libllvm --link-static
USE_VERILATOR: OFF
USE_TF_TVMDSOOP: OFF
USE_THREADS: ON
USE_MSVC_MT: OFF
BACKTRACE_ON_SEGFAULT: OFF
USE_GRAPH_EXECUTOR: ON
USE_NCCL: OFF
USE_ROCBLAS: OFF
GIT_COMMIT_HASH: 34bd9be0a80c49b2fcbd082542e5aba955a450f7
USE_VULKAN: OFF
USE_RUST_EXT: OFF
USE_CUTLASS: OFF
USE_CPP_RPC: OFF
USE_HEXAGON: OFF
USE_CUSTOM_LOGGING: OFF
USE_UMA: OFF
USE_FALLBACK_STL_MAP: OFF
USE_SORT: ON
USE_RTTI: ON
GIT_COMMIT_TIME: 2023-08-29 19:32:42 +0800
USE_HEXAGON_SDK: /path/to/sdk
USE_BLAS: none
USE_ETHOSN: /opt/arm/ethosn-driver
USE_LIBTORCH: OFF
USE_RANDOM: ON
USE_CUDA: OFF
USE_COREML: OFF
USE_AMX: OFF
BUILD_STATIC_RUNTIME: OFF
USE_CMSISNN: ON
USE_KHRONOS_SPIRV: OFF
USE_CLML_GRAPH_EXECUTOR: OFF
USE_TFLITE: OFF
USE_HEXAGON_GTEST: /path/to/hexagon/gtest
PICOJSON_PATH: 3rdparty/picojson
USE_OPENCL_ENABLE_HOST_PTR: OFF
INSTALL_DEV: OFF
USE_PROFILER: ON
USE_NNPACK: OFF
LLVM_VERSION: 10.0.1
USE_OPENCL: OFF
COMPILER_RT_PATH: 3rdparty/compiler-rt
RANG_PATH: 3rdparty/rang/include
USE_SPIRV_KHR_INTEGER_DOT_PRODUCT: OFF
USE_OPENMP: OFF
USE_BNNS: OFF
USE_CUBLAS: OFF
USE_METAL: OFF
USE_MICRO_STANDALONE_RUNTIME: ON
USE_HEXAGON_EXTERNAL_LIBS: OFF
USE_ALTERNATIVE_LINKER: AUTO
USE_BYODT_POSIT: OFF
USE_HEXAGON_RPC: OFF
USE_MICRO: ON
DMLC_PATH: 3rdparty/dmlc-core/include
INDEX_DEFAULT_I64: ON
USE_RELAY_DEBUG: OFF
USE_RPC: ON
USE_TENSORFLOW_PATH: none
TVM_CLML_VERSION: 
USE_MIOPEN: OFF
USE_ROCM: OFF
USE_PAPI: OFF
USE_CURAND: OFF
TVM_CXX_COMPILER_PATH: /opt/rh/devtoolset-9/root/usr/bin/c++
HIDE_PRIVATE_SYMBOLS: ON

Any other relevant information:

mlc-ai / mlc-llm