Closed ArbitRandomUser closed 8 months ago
Any of those instructions look familiar to you @chriselrod?
LV will generate llvm.assume
. But it's not "real"/ makes no sense to call it.
I just tried it and got
julia> ptr = Libdl.dlopen("./mult2.so", Libdl.RTLD_NOW)#error
ERROR: could not load library "./mult2.so"
./mult2.so: undefined symbol: llvm.x86.bmi.bzhi.32
which I didn't see listed as U
in your example.
LoopVectorization will try to use bmi.bzhi
for creating masks when generating AVX512 code.
It requires the bmi extension, which every CPU with AVX512 LV supports has.
But generic targets don't.
So I've seen issues like this before from PackageCompiler. LoopVectorization only knows how to check host features, not target features.
Does compile_shlib
target a generic architecture?
While it might be possible to change, compile_shlib
generally targets the host architecture, specifically whatever Clang_jll
targets by default
Ah, interesting..
While it might be possible to change,
compile_shlib
generally targets the host architecture, specifically whateverClang_jll
targets by defaultAh, interesting..
Targeting the host should be fine, as that's what LV assumes.
I just tried it and got
julia> ptr = Libdl.dlopen("./mult2.so", Libdl.RTLD_NOW)#error ERROR: could not load library "./mult2.so" ./mult2.so: undefined symbol: llvm.x86.bmi.bzhi.32
which I didn't see listed as
U
in your example. LoopVectorization will try to usebmi.bzhi
for creating masks when generating AVX512 code. It requires the bmi extension, which every CPU with AVX512 LV supports has. But generic targets don't.So I've seen issues like this before from PackageCompiler. LoopVectorization only knows how to check host features, not target features.
Does
compile_shlib
target a generic architecture?
just tried on a cpu with avx512 and i get the same error, infact the .so
includes llvm.assume along with the above symbol
[john@padmanabha3 staticjulia]$ nm mult2.so
0000000000202038 B __bss_start
0000000000202038 b completed.6355
w __cxa_finalize@@GLIBC_2.2.5
0000000000000650 t deregister_tm_clones
00000000000006c0 t __do_global_dtors_aux
0000000000201e00 t __do_global_dtors_aux_fini_array_entry
0000000000201e10 d __dso_handle
0000000000201e18 d _DYNAMIC
0000000000202038 D _edata
0000000000202040 B _end
00000000000011a4 T _fini
0000000000000700 t frame_dummy
0000000000201df8 t __frame_dummy_init_array_entry
0000000000001358 r __FRAME_END__
0000000000202000 d _GLOBAL_OFFSET_TABLE_
w __gmon_start__
00000000000012a0 r __GNU_EH_FRAME_HDR
00000000000005d0 T _init
w _ITM_deregisterTMCloneTable
w _ITM_registerTMCloneTable
0000000000201e08 d __JCR_END__
0000000000201e08 d __JCR_LIST__
w _Jv_RegisterClasses
U llvm.assume
U llvm.assume.renamed
U llvm.x86.bmi.bzhi.32
U llvm.x86.bmi.bzhi.32.renamed
0000000000000680 t register_tm_clones
0000000000000740 T test
0000000000000f50 T test2
0000000000202038 d __TMC_END__
Minimal example:
julia> using StaticCompiler, VectorizationBase, Libdl
julia> function test(n::Int64)::Float64
VectorizationBase.assume(n>0)
1/n
end
test (generic function with 1 method)
julia> function test2(n::Int64)::Float64
VectorizationBase.assume(n>0)
2/n
end
test2 (generic function with 1 method)
julia> compile_shlib([(test,(Int64,)),(test2,(Int64,))],filename="inv2",demangle=true)
"/home/chriselrod/Documents/progwork/cxx/LoopModels/inv2.so"
julia> ptr = Libdl.dlopen("./inv2.so", Libdl.RTLD_NOW)#error
ERROR: could not load library "./inv2.so"
./inv2.so: undefined symbol: llvm.assume
Stacktrace:
[1] dlopen(s::String, flags::UInt32; throw_error::Bool)
@ Base.Libc.Libdl ./libdl.jl:117
[2] dlopen(s::String, flags::UInt32)
@ Base.Libc.Libdl ./libdl.jl:116
[3] top-level scope
@ REPL[10]:1
Any idea why this seems to cause a problem here but not in the normal Julia compilation pathway? Maybe we need to add some cleaning passes or something?
No, but @pchintalapudi might.
Building with LLVM assertions, I get a different error:
julia> compile_shlib([(test,(Int64,)),(test2,(Int64,))],filename="inv2",demangle=true)
julia: /home/premc/julia/deps/srccache/llvm-julia-15.0.7-7/llvm/lib/IR/Function.cpp:1815: llvm::Optional<llvm::Function*> llvm::Intrinsic::remangleIntrinsicFunction(llvm::Function*): Assertion `NewDecl->getFunctionType() == F->getFunctionType() && "Shouldn't change the signature"' failed.
[907917] signal (6.-6): Aborted
in expression starting at REPL[4]:1
pthread_kill at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
raise at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x7f3b5dc9071a)
__assert_fail at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
remangleIntrinsicFunction at /home/premc/julia/deps/srccache/llvm-julia-15.0.7-7/llvm/lib/IR/Function.cpp:1815
linkGlobalValueProto at /home/premc/julia/deps/srccache/llvm-julia-15.0.7-7/llvm/lib/Linker/IRMover.cpp:1047 [inlined]
materialize at /home/premc/julia/deps/srccache/llvm-julia-15.0.7-7/llvm/lib/Linker/IRMover.cpp:594
mapValue at /home/premc/julia/deps/srccache/llvm-julia-15.0.7-7/llvm/lib/Transforms/Utils/ValueMapper.cpp:350
remapInstruction at /home/premc/julia/deps/srccache/llvm-julia-15.0.7-7/llvm/lib/Transforms/Utils/ValueMapper.cpp:917
remapFunction at /home/premc/julia/deps/srccache/llvm-julia-15.0.7-7/llvm/lib/Transforms/Utils/ValueMapper.cpp:1015
flush at /home/premc/julia/deps/srccache/llvm-julia-15.0.7-7/llvm/lib/Transforms/Utils/ValueMapper.cpp:899
~FlushingMapper at /home/premc/julia/deps/srccache/llvm-julia-15.0.7-7/llvm/lib/Transforms/Utils/ValueMapper.cpp:1135 [inlined]
mapValue at /home/premc/julia/deps/srccache/llvm-julia-15.0.7-7/llvm/lib/Transforms/Utils/ValueMapper.cpp:1160
run at /home/premc/julia/deps/srccache/llvm-julia-15.0.7-7/llvm/lib/Linker/IRMover.cpp:1595 [inlined]
move at /home/premc/julia/deps/srccache/llvm-julia-15.0.7-7/llvm/lib/Linker/IRMover.cpp:1748
run at /home/premc/julia/deps/srccache/llvm-julia-15.0.7-7/llvm/lib/Linker/LinkModules.cpp:581 [inlined]
linkInModule at /home/premc/julia/deps/srccache/llvm-julia-15.0.7-7/llvm/lib/Linker/LinkModules.cpp:603
linkModules at /home/premc/julia/deps/srccache/llvm-julia-15.0.7-7/llvm/lib/Linker/LinkModules.cpp:619
LLVMLinkModules2 at /home/premc/julia/deps/srccache/llvm-julia-15.0.7-7/llvm/lib/Linker/LinkModules.cpp:629
LLVMLinkModules2 at /home/premc/.julia/packages/LLVM/Od0DH/lib/15/libLLVM_h.jl:4926 [inlined]
link! at /home/premc/.julia/packages/LLVM/Od0DH/src/linker.jl:4 [inlined]
#native_llvm_module#47 at /home/premc/.julia/packages/StaticCompiler/LMT2M/src/StaticCompiler.jl:603
native_llvm_module at /home/premc/.julia/packages/StaticCompiler/LMT2M/src/StaticCompiler.jl:596 [inlined]
#generate_obj#51 at /home/premc/.julia/packages/StaticCompiler/LMT2M/src/StaticCompiler.jl:706
generate_obj at /home/premc/.julia/packages/StaticCompiler/LMT2M/src/StaticCompiler.jl:697 [inlined]
#generate_shlib#40 at /home/premc/.julia/packages/StaticCompiler/LMT2M/src/StaticCompiler.jl:559
generate_shlib at /home/premc/.julia/packages/StaticCompiler/LMT2M/src/StaticCompiler.jl:551 [inlined]
#compile_shlib#31 at /home/premc/.julia/packages/StaticCompiler/LMT2M/src/StaticCompiler.jl:352
compile_shlib at /home/premc/.julia/packages/StaticCompiler/LMT2M/src/StaticCompiler.jl:335
compile_shlib at /home/premc/.julia/packages/StaticCompiler/LMT2M/src/StaticCompiler.jl:335
unknown function (ip: 0x7f3b5ca85a9d)
_jl_invoke at /home/premc/julia/src/gf.c:2870 [inlined]
ijl_apply_generic at /home/premc/julia/src/gf.c:3071
jl_apply at /home/premc/julia/src/julia.h:1969 [inlined]
do_call at /home/premc/julia/src/interpreter.c:125
eval_value at /home/premc/julia/src/interpreter.c:222
eval_stmt_value at /home/premc/julia/src/interpreter.c:173 [inlined]
eval_body at /home/premc/julia/src/interpreter.c:620
jl_interpret_toplevel_thunk at /home/premc/julia/src/interpreter.c:774
jl_toplevel_eval_flex at /home/premc/julia/src/toplevel.c:934
jl_toplevel_eval_flex at /home/premc/julia/src/toplevel.c:877
ijl_toplevel_eval_in at /home/premc/julia/src/toplevel.c:985
eval at ./boot.jl:383 [inlined]
eval_user_input at /home/premc/julia/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:150
repl_backend_loop at /home/premc/julia/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:246
#start_repl_backend#46 at /home/premc/julia/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:231
start_repl_backend at /home/premc/julia/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:228
_jl_invoke at /home/premc/julia/src/gf.c:2870 [inlined]
ijl_apply_generic at /home/premc/julia/src/gf.c:3071
#run_repl#59 at /home/premc/julia/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:387
run_repl at /home/premc/julia/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:373
jfptr_run_repl_91909 at /home/premc/julia/usr/lib/julia/sys.so (unknown line)
_jl_invoke at /home/premc/julia/src/gf.c:2870 [inlined]
ijl_apply_generic at /home/premc/julia/src/gf.c:3071
#1037 at ./client.jl:432
jfptr_YY.1037_82773 at /home/premc/julia/usr/lib/julia/sys.so (unknown line)
_jl_invoke at /home/premc/julia/src/gf.c:2870 [inlined]
ijl_apply_generic at /home/premc/julia/src/gf.c:3071
jl_apply at /home/premc/julia/src/julia.h:1969 [inlined]
jl_f__call_latest at /home/premc/julia/src/builtins.c:812
#invokelatest#2 at ./essentials.jl:887 [inlined]
invokelatest at ./essentials.jl:884 [inlined]
run_main_repl at ./client.jl:416
exec_options at ./client.jl:333
_start at ./client.jl:552
jfptr__start_82799 at /home/premc/julia/usr/lib/julia/sys.so (unknown line)
_jl_invoke at /home/premc/julia/src/gf.c:2870 [inlined]
ijl_apply_generic at /home/premc/julia/src/gf.c:3071
jl_apply at /home/premc/julia/src/julia.h:1969 [inlined]
true_main at /home/premc/julia/src/jlapi.c:582
jl_repl_entrypoint at /home/premc/julia/src/jlapi.c:731
main at /home/premc/julia/cli/loader_exe.c:58
unknown function (ip: 0x7f3b5dc91d8f)
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
_start at ./julia (unknown line)
Allocations: 9273276 (Pool: 9256420; Big: 16856); GC: 6
Aborted
This indicates we're generating a malformed intrinsic, and indeed stepping through gdb we find a llvm.assume.1
:
; ModuleID = 'start'
source_filename = "start"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-linux-gnu"
define double @test(i64 signext %"n::Int64") local_unnamed_addr #0 {
top:
%0 = icmp sgt i64 %"n::Int64", 0
call void @llvm.assume(i1 %0)
%1 = sitofp i64 %"n::Int64" to double
%2 = fdiv double 1.000000e+00, %1
ret double %2
}
; Function Attrs: inaccessiblememonly nocallback nofree nosync nounwind willreturn
declare void @llvm.assume(i1 noundef %0) #1
define double @test2(i64 signext %"n::Int64") local_unnamed_addr #2 {
top:
%0 = icmp sgt i64 %"n::Int64", 0
call void @llvm.assume(i1 %0)
%1 = sitofp i64 %"n::Int64" to double
%2 = fdiv double 2.000000e+00, %1
ret double %2
}
; Function Attrs: inaccessiblememonly nocallback nofree nosync nounwind willreturn
declare void @llvm.assume.1(i1 noundef %0) #3
attributes #0 = { "probe-stack"="inline-asm" }
attributes #1 = { inaccessiblememonly nocallback nofree nosync nounwind willreturn }
attributes #2 = { "probe-stack"="inline-asm" }
attributes #3 = { inaccessiblememonly nocallback nofree nosync nounwind willreturn }
!llvm.module.flags = !{!0, !1}
!0 = !{i32 2, !"Dwarf Version", i32 4}
!1 = !{i32 2, !"Debug Info Version", i32 3}
As for why LLVM claims it wants to replace that with a different intrinsic type, I'm not entirely sure.
Turns out it's not trying to replace with a different intrinsic type, it's that someone's trying to link two modules with different contexts together. Most likely this is an issue in StaticCompiler, where it doesn't ensure that the linked modules are in the same context?
(gdb) p NewDecl
$17 = <optimized out>
(gdb) p F->VTy->Context
$18 = (llvm::LLVMContext &) @0x555556541960: {pImpl = 0x55555818b450}
(gdb) p F->Parent->Context
$19 = (llvm::LLVMContext &) @0x555556dea9d0: {pImpl = 0x5555564cba80}
yes looks like https://github.com/tshort/StaticCompiler.jl/blob/f552ce0ea3642653daf11533b8f8fef1add33d58/src/StaticCompiler.jl#L602 is where the problem is at , the dispatch for the single function compiling activates and deactivates a new context . I dont know why this problem turns up with only LoopVectorization though
I could get it to work by making a context and then doing whatevr native_llvm_module(ff,t)
does inside native_llvm_module(funcs)
and then deactivating the context after the loop.
i'll wait for you'll thoughts on the "right" way to implement this , would be more than happy to make a PR.
i was able to get it working with this https://github.com/tshort/StaticCompiler.jl/compare/master...ArbitRandomUser:StaticCompiler.jl:contextfix
Go ahead on the PR. At first glance, that fix looks reasonable to me.
i seem to be getting into errors with demangling in the tests . it seems no matter what (demangled=true or false) the names dont get mangled .
I'm a little new to using GPUCompiler , i'll look around , meanwhile if its apparent to you why ,let me know
managed to fix that issue , tests are passing as well
mwe...
the former compilation , generating seperate
.o
files for both functions and compiling them to shared library with clang works. the later way of compiling both functions into a single.so
from within julia includes undefined symbols in the.so
this is specific when both functions use the
@turbo
macro fromLoopVectorization
. if either of them do not , the generated.so
works fineproject :