intel / intel-graphics-compiler

Other
606 stars 158 forks source link

Intended debug flow for libigc failures? #206

Closed kurapov-peter closed 2 years ago

kurapov-peter commented 3 years ago

Hi, I'm using llvm 12 (built from sources) and Khronos SPIRV translator to generate IR (opencl-like) and then SPIRV for compute-runtime build and run. The flow works just fine when I use release packages, yet I'm facing failures in libigc sometimes. I used to build debug versions of igc, neo and loader to track issues down, but updating components versions to latest release tags (and switch to llvm 11) yields assertion failure when dlopening libigc (my app first openats ze_loader, libLLVM-12 and then tries to dlopen libigc in runtime):

IGC/llvm-deps/src/llvm/include/llvm/Support/CommandLine.h:851: void llvm::cl::parser<DataType>::addLiteralOption(llvm::StringRef, const DT&, llvm::StringRef) [with DT = llvm::FunctionPass* (*)(); DataType = llvm::FunctionPass* (*)()]: Assertion `findOption(Name) == Values.size() && "Option already exists!"' failed.

I tried passing -fvisibility=hidden to my application with no luck and I don't dlopen the first instance of llvm (12) to make symbols local, for example.

Anything I'm missing here? Or is there an intended way for debugging issues with arbitrary code generation? The build log provided by the driver didn't help.

kurapov-peter commented 3 years ago

I'm able to reproduce the issue with a simple example at https://github.com/kurapov-peter/L0Snippets/blob/main/complete_flow.cpp. g++ ../complete_flow.cpp -o complete -Wl,-rpath,/gfx_deps/neo_loader/lib /gfx_deps/neo_loader/lib/libze_loader.so -lLLVMSPIRVLib -L /path/to/llvm12/lib -lLLVM -L /path/to/llvm12/lib -I /path/to/llvm12/include/

IGC is built as in https://github.com/kurapov-peter/gfx_deps/blob/main/build_igc.sh

The assertion fails on IGC dlopen as llvm11 tries to register RegAlloc pass within same process as application (another thread):

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ffff1d33859 in __GI_abort () at abort.c:79
#2  0x00007ffff1d33729 in __assert_fail_base (fmt=0x7ffff1ec9588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x7fffda6fe368 "findOption(Name) == Values.size() && \"Option already exists!\"", 
    file=0x7fffda6fe318 "/gfx_deps/build/igc/IGC/llvm-deps/src/llvm/include/llvm/Support/CommandLine.h", line=851, function=<optimized out>) at assert.c:92
#3  0x00007ffff1d44f36 in __GI___assert_fail (assertion=0x7fffda6fe368 "findOption(Name) == Values.size() && \"Option already exists!\"", file=0x7fffda6fe318 "/gfx_deps/build/igc/IGC/llvm-deps/src/llvm/include/llvm/Support/CommandLine.h", line=851, 
    function=0x7fffda6fe268 "void llvm::cl::parser<DataType>::addLiteralOption(llvm::StringRef, const DT&, llvm::StringRef) [with DT = llvm::FunctionPass* (*)(); DataType = llvm::FunctionPass* (*)()]") at assert.c:101
#4  0x00007fffd878edee in llvm::cl::parser<llvm::FunctionPass* (*)()>::addLiteralOption<llvm::FunctionPass* (*)()> (this=0x7fffdd3c4968 <RegAlloc+168>, Name=..., V=@0x7fffffff6c70: 0x7fffd8785b69 <useDefaultRegisterAllocator()>, HelpStr=...)
    at /gfx_deps/build/igc/IGC/llvm-deps/src/llvm/include/llvm/Support/CommandLine.h:851
#5  0x00007fffd8792758 in llvm::RegisterPassParser<llvm::RegisterRegAlloc>::NotifyAdd (this=0x7fffdd3c4960 <RegAlloc+160>, N=..., C=0x7fffd8785b69 <useDefaultRegisterAllocator()>, D=...)
    at /gfx_deps/build/igc/IGC/llvm-deps/src/llvm/include/llvm/CodeGen/MachinePassRegistry.h:162
#6  0x00007fffd878ad99 in llvm::MachinePassRegistry<llvm::FunctionPass* (*)()>::Add (this=0x7ffff7e7ee40 <llvm::RegisterRegAllocBase<llvm::RegisterRegAlloc>::Registry>, Node=0x7fffdd3c4b80 <defaultRegAlloc>)
    at /gfx_deps/build/igc/IGC/llvm-deps/src/llvm/include/llvm/CodeGen/MachinePassRegistry.h:110
#7  0x00007fffd8788e40 in llvm::RegisterRegAllocBase<llvm::RegisterRegAlloc>::RegisterRegAllocBase (this=0x7fffdd3c4b80 <defaultRegAlloc>, N=0x7fffda6ff353 "default", D=0x7fffda6ff328 "pick register allocator based on -O option", 
    C=0x7fffd8785b69 <useDefaultRegisterAllocator()>) at /gfx_deps/build/igc/IGC/llvm-deps/src/llvm/include/llvm/CodeGen/RegAllocRegistry.h:37
#8  0x00007fffd8788c48 in llvm::RegisterRegAlloc::RegisterRegAlloc (this=0x7fffdd3c4b80 <defaultRegAlloc>, N=0x7fffda6ff353 "default", D=0x7fffda6ff328 "pick register allocator based on -O option", C=0x7fffd8785b69 <useDefaultRegisterAllocator()>)
    at /gfx_deps/build/igc/IGC/llvm-deps/src/llvm/include/llvm/CodeGen/RegAllocRegistry.h:63
eero-t commented 2 years ago

Everything in the whole stack needs to be using same LLVM + LLVM-SPIRV-Translator + OpenCL-Clang version. Otherwise things break subtly or not so subtly (as I found out in another bug).

mnaczk commented 2 years ago

As @eero-t mentioned IGC needs the same LLVM in the whole stack. If you use a different version of LLVM for different components of IGC, we can not guarantee that this will work, and we also cannot guarantee gentle failing. (LLVM per version has no compatible API) Do you use the same LLVM version in the whole stack?

mnaczk commented 2 years ago

I am closing the issue due to the lack of response from the submitter. @kurapov-peter If the problem still exists please reopen the issue