Open sin-ack opened 8 months ago
Reduced to:
target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7:8"
target triple = "amdgcn-amd-amdhsa"
; Function Attrs: sspstrong
define amdgpu_kernel void @_ZL13mul_mat_vec_qILi4ELi32ELi4E10block_q4_0Li2EXadL_ZL17vec_dot_q4_0_q8_1PKvPK10block_q8_1RKiEEEvS2_S2_Pfiiii() #0 {
%1 = alloca [4 x [2 x float]], i32 0, align 16, addrspace(5)
call void @llvm.memset.p5.i64(ptr addrspace(5) %1, i8 0, i64 0, i1 false)
ret void
}
; Function Attrs: nocallback nofree nounwind willreturn memory(argmem: write)
declare void @llvm.memset.p5.i64(ptr addrspace(5) nocapture writeonly, i8, i64, i1 immarg) #1
attributes #0 = { sspstrong }
attributes #1 = { nocallback nofree nounwind willreturn memory(argmem: write) }
It appears that -fstack-protector
somehow got enabled on the GPU side. I'm not sure whether AMDGPU supports it, but I would expect that to be a problem for NVPTX.
Disabling stack protector on the GPU side should avoid the problem.
/usr/lib/llvm/17/bin/clang++ on Gentoo enables -fstack-protector-strong
for all targets in /etc/clang/x86_64-pc-linux-gnu-clang.cfg -> gentoo-common.cfg -> gentoo-hardened.cfg.
This was previously discussed in https://github.com/llvm/llvm-project/issues/62066 and fixed in https://github.com/llvm/llvm-project/pull/70799 in 18.1.0 release.
Additionally, on Gentoo side multiple patches were added to hipcc and rocm-runtime to add -fno-stack-protector
when user compiles code with hipcc wrapper or from rocm runtime while using Clang-17 (sorry, can't do better than that; Gentoo does not backport patches for LLVM). Just use hipcc, it will add multiple flags as described in https://wiki.gentoo.org/wiki/HIP#hipcc_.28Clang_wrapper.29
Regarding Clang-18 support in HIP, today I did few experiments and with few patches it worked, but encountered huge memory consumption in https://github.com/llvm/llvm-project/issues/86332 - which looks like a blocker... So Gentoo will probably stay on LLVM-17 for hipcc in nearest time.
I don't believe amdgpu has stack-protector either. I would guess the desired behaviour of -x cuda -fstack-protector would be to enable the stack protector on the x64 code and do nothing on the gpu code, at least until such time as that's implemented on the gpu. Maybe emit a warning in the meantime.
Do we have a general purpose way of specifying pass some argument to the host clang invocation and some other argument to the device invocation? Openmp has/had some means of doing that which worked in some cases.
We do not have a consistent way to handle arguments that don't have the same level of support between host and the GPU. So far, in most commonly encountered cases (e.g. sanitizers) we've been filtering out such arguments on the case by case basis, and that's not ideal.
We do have -Xarch_host
and -Xarch_device
which may be used to override top-level flags, but it does not always work if top-level flags get converted into a set of different cc1
arguments.
Backtrace:
LLVM IR file: ggml-cuda.cu.ll.gz
The IR was generated using Clang 17.0.6 and hipBLAS 5.7.1, from
ggml-cuda.cu
in https://github.com/ggerganov/llama.cpp/commit/67be2ce1015d070b3b2cd488bcb041eefb61de72Command used to generate the IR
`/usr/lib/llvm/17/bin/clang++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_USE_CUBLAS -DGGML_USE_HIPBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -D__HIP_PLATFORM_AMD__=1 -D__HIP_PLATFORM_HCC__=1 -I/labs/llama.cpp/. -isystem /usr/include/rocblas --rocm-device-lib-path=/usr/lib/amdgcn/bitcode/ -O3 -DNDEBUG -std=gnu++11 -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wunreachable-code-break -Wunreachable-code-return -Wmissing-prototypes -Wextra-semi -march=native -mllvm -amdgpu-early-inline-all=true -mllvm -amdgpu-function-calls=false -x hip -MD -MT CMakeFiles/ggml.dir/ggml-cuda.cu.o -o ggml-cuda.cu.ll -S /labs/llama.cpp/ggml-cuda.cu -emit-llvm`
``` clang version 17.0.6 Target: x86_64-pc-linux-gnu Thread model: posix InstalledDir: /usr/lib/llvm/17/bin Configuration file: /etc/clang/x86_64-pc-linux-gnu-clang.cfg ```clang --version