Crash when compiling kernel from Geekbench 6 Horizon Detection

kpet commented 1 year ago

When compiling the attached source, with the following options:

clspv source.cl  -cl-fast-relaxed-math -cl-single-precision-constant  -cl-kernel-arg-info   -fp16=0  -rewrite-packed-structs  -std430-ubo-layout  -decorate-nonuniform    -arch=spir  --use-native-builtins=acos,acosh,acospi,asin,asinh,asinpi,atan,atan2,atan2pi,atanh,atanpi,ceil,copysign,fabs,fdim,floor,fma,fmax,fmin,frexp,half_rsqrt,half_sqrt,isequal,isfinite,isgreater,isgreaterequal,isinf,isless,islessequal,islessgreater,isnan,isnormal,isnotequal,isordered,isunordered,ldexp,mad,rint,round,rsqrt,signbit,sqrt,tanh,trunc,  -spv-version=1.6  -max-pushconstant-size=256  -max-ubo-size=65536  -global-offset  -long-vector  -module-constants-in-storage-buffer  -cl-arm-non-uniform-work-group-size     -o compiled.spv

clspv crashes as follows (don't know why the assert is not firing, assertions are enabled):

Program received signal SIGSEGV, Segmentation fault.
llvm::GetElementPtrInst::Create (PointeeType=0x55555ae79d98, Ptr=0x55500fa14be0, IdxList=..., NameStr=..., InsertBefore=0x55555b1e6450)
    at /path/to/clvk/external/clspv/third_party/llvm/llvm/include/llvm/IR/Instructions.h:972
972         assert(cast<PointerType>(Ptr->getType()->getScalarType())
(gdb) bt
#0  llvm::GetElementPtrInst::Create (PointeeType=0x55555ae79d98, Ptr=0x55500fa14be0, IdxList=..., NameStr=..., InsertBefore=0x55555b1e6450)
    at /path/to/clvk/external/clspv/third_party/llvm/llvm/include/llvm/IR/Instructions.h:972
#1  0x0000555555ba45e6 in clspv::SimplifyPointerBitcastPass::runOnGEPImplicitCasts (this=<optimized out>, M=...)
    at /path/to/clvk/external/clspv/lib/SimplifyPointerBitcastPass.cpp:490
#2  0x0000555555ba527a in clspv::SimplifyPointerBitcastPass::run (this=0x55555b1be238, M=...) at /path/to/clvk/external/clspv/lib/SimplifyPointerBitcastPass.cpp:52
#3  0x0000555555a89f91 in llvm::detail::PassModel<llvm::Module, clspv::SimplifyPointerBitcastPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (this=<optimized out>, IR=..., AM=...) at /path/to/clvk/external/clspv/third_party/llvm/llvm/include/llvm/IR/PassManagerInternal.h:89
#4  0x000055555906e5a1 in llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (this=<optimized out>, IR=..., 
    AM=...) at /path/to/clvk/external/clspv/third_party/llvm/llvm/include/llvm/IR/PassManager.h:517
#5  0x0000555555aa5458 in (anonymous namespace)::RunPassPipeline (binaryStream=0x7fffffff8b40, M=...) at /path/to/clvk/external/clspv/lib/Compiler.cpp:740
#6  (anonymous namespace)::CompileModule (module=std::unique_ptr<llvm::Module> = {...}, output_buffer=output_buffer@entry=0x0, output_log=0x0, input_filename=...)
    at /path/to/clvk/external/clspv/lib/Compiler.cpp:1101
#7  0x0000555555aa678c in (anonymous namespace)::CompileProgram (input_filename=..., 
    program="// Copyright (C) 2004-2020 Primate Labs Inc.  All Rights Reserved.\r\n\r\n// Matches struct Line in hough.h\r\nstruct Line {\r\n  int theta;\r\n  float rho;\r\n  int score;\r\n};\r\n\r\n// Linear interpolate between (0"..., output_buffer=output_buffer@entry=0x0, output_log=output_log@entry=0x0) at /path/to/clvk/external/clspv/lib/Compiler.cpp:1148
#8  0x0000555555aa7383 in clspv::Compile (argc=<optimized out>, argv=<optimized out>) at /path/to/clvk/external/clspv/lib/Compiler.cpp:1210
#9  0x00007ffff785168a in __libc_start_call_main (main=main@entry=0x5555559f75d0 <main(int, char const* const*)>, argc=argc@entry=20, argv=argv@entry=0x7fffffffca08)
    at ../sysdeps/nptl/libc_start_call_main.h:58
#10 0x00007ffff7851745 in __libc_start_main_impl (main=0x5555559f75d0 <main(int, char const* const*)>, argc=20, argv=0x7fffffffca08, init=<optimized out>, fini=<optimized out>, 
    rtld_fini=<optimized out>, stack_end=0x7fffffffc9f8) at ../csu/libc-start.c:360
#11 0x0000555555a88a41 in _start ()

source.txt

alan-baker commented 1 year ago

The core of this problem is that clspv is trying to perform multiple transforms on the same instruction. It identifies a few implicit casts, deals with the first and erases the basis of the second.

Here is a minimal testcase:

target datalayout = "e-p:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024"
target triple = "spir-unknown-unknown"

%struct.Line = type { i32, float, i32 }

define void @test(ptr addrspace(1) %lines, i32 %n) {
entry:
  %gep1 = getelementptr inbounds %struct.Line, ptr addrspace(1) %lines, i32 %n
  %gep2 = getelementptr inbounds i8, ptr addrspace(1) %gep1, i32 4
  store float 0.000000e+00, ptr addrspace(1) %gep2, align 4
  ret void
}

Run that source as test.ll using the command:

clspv-opt test.ll --passes=simplify-pointer-bitcast

The transform is also iterating in a non-deterministic manner so I've seen a couple different failure modes on the original source. I plan to break up the implicit cast transform into a more pieces (with some redundancy) to avoid these issues.

rjodinchr commented 1 year ago

Yes ImplicitCasts should not be a DenseMap.

alan-baker commented 1 year ago

As I split this up 3 tests have started to fail. I haven't debugged them all, but test6 in test/PointerCasts/opaque_trivial_casts.ll seems to have been hiding a bug. Here is a snippet of that file (including test5 due to similarities):


; CHECK-LABEL: define void @test5(ptr %in) {
; CHECK: entry:
; CHECK:   %0 = getelementptr i32, ptr %in, i32 2
; CHECK:   ret void
; CHECK: }

define void @test5(ptr %in) {
entry:
  %gep1 = getelementptr i32, ptr %in, i32 1
  %gep2 = getelementptr i32, ptr %gep1, i32 1   
  ret void
}

; CHECK-LABEL: define void @test6(ptr %in) {
; CHECK: entry:
; CHECK:   getelementptr float, ptr %in, i32 1
; CHECK-NEXT: ret void
; CHECK: }

define void @test6(ptr %in) {
entry:
  %gep1 = getelementptr float, ptr %in, i32 1
  %gep2 = getelementptr i32, ptr %gep1, i32 1
  ret void
}

test5 seems correct in that it calculates an 8 byte offset from %in, but test6 seems incorrect because it only calculates a 4 byte offset from %in.

alan-baker commented 1 year ago

The bad test seems to stem from the pointer being in the private address space. The same test gives expected results if the pointer is in addrspace 1 for example.

alan-baker commented 1 year ago

All tests are passing after the refactor, but the minimal case for the original bug (see https://github.com/google/clspv/issues/1113#issuecomment-1551543610) appears to be transformed incorrectly.

The current result is:

target datalayout = "e-p:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024"
target triple = "spir-unknown-unknown"

%struct.Line = type { i32, float, i32 }

define void @test(ptr addrspace(1) %lines, i32 %n) {
entry:
  %0 = getelementptr %struct.Line, ptr addrspace(1) %lines, i32 %n
  store float 0.000000e+00, ptr addrspace(1) %0, align 4
  ret void
}

alan-baker commented 1 year ago

The problem boils down to recalculating the GEP indices. The struct is 12 bytes, so the math used is to calculate the index is (1 / 12) which loses resolution.

google / clspv

Crash when compiling kernel from Geekbench 6 Horizon Detection #1113