tshort / StaticCompiler.jl

Compiles Julia code to a standalone library (experimental)
Other
488 stars 31 forks source link

undefined symbols in .so file #138

Closed ArbitRandomUser closed 8 months ago

ArbitRandomUser commented 11 months ago

mwe...

using StaticCompiler,LoopVectorization
using BenchmarkTools,Libdl
function test(n::Int64)::Float64
    r = 0.0
    @turbo for i=1:n
        r+= log(sqrt(i))
    end
    return r/n
end

function test2(n::Int64)::Float64
    r=0.0
    @turbo for i=1:n
        r+=sqrt(i)
    end
    return r/n
end

StaticCompiler.generate_obj(test,(Int,),true,"./mult","multf1")
StaticCompiler.generate_obj(test2,(Int,),true,"./mult","multf2")
run(`clang -shared ./mult/multf1.o ./mult/multf2.o -o mult1.so`)
ptr = Libdl.dlopen("./mult1.so", Libdl.RTLD_NOW)
fptr1 = Libdl.dlsym(ptr, "test")
ccall(fptr1,Float64,(Int64,),100_000)#works fine

print("done")

compile_shlib([(test,(Int64,)),(test2,(Int64,))],filename="mult2",demangle=true)
ptr = Libdl.dlopen("./mult2.so", Libdl.RTLD_NOW)#error
fptr2 = Libdl.dlsym(ptr, "test2")
ccall(fptr2,Float64,(Int64,),100_000)

the former compilation , generating seperate .o files for both functions and compiling them to shared library with clang works. the later way of compiling both functions into a single .so from within julia includes undefined symbols in the .so

shell> nm mult2.so
                 w __cxa_finalize@GLIBC_2.2.5
0000000000005010 d __dso_handle
0000000000004df8 d _DYNAMIC
00000000000026b8 t _fini
0000000000004fe8 d _GLOBAL_OFFSET_TABLE_
                 w __gmon_start__
00000000000030f0 r __GNU_EH_FRAME_HDR
0000000000001000 t _init
0000000000001120 t isinf_2968
                 w _ITM_deregisterTMCloneTable
                 w _ITM_registerTMCloneTable
0000000000002480 t julia_isinf_2952
                 U llvm.assume
                 U llvm.assume.renamed
00000000000011b0 T test
00000000000024d0 T test2
0000000000005018 d __TMC_END__

this is specific when both functions use the @turbo macro from LoopVectorization. if either of them do not , the generated .so works fine

project :

(staticjulia) pkg> st
Status `~/staticjulia/Project.toml`
  [6e4b80f9] BenchmarkTools v1.3.2
  [61eb1bfa] GPUCompiler v0.21.4
  [929cbde3] LLVM v6.1.0
  [bdcacae8] LoopVectorization v0.12.165
  [90137ffa] StaticArrays v1.6.2
  [81625895] StaticCompiler v0.5.3
  [86c06d3c] StaticTools v0.8.8
brenhinkeller commented 11 months ago

Any of those instructions look familiar to you @chriselrod?

chriselrod commented 11 months ago

LV will generate llvm.assume. But it's not "real"/ makes no sense to call it.

chriselrod commented 11 months ago

I just tried it and got

julia> ptr = Libdl.dlopen("./mult2.so", Libdl.RTLD_NOW)#error
ERROR: could not load library "./mult2.so"
./mult2.so: undefined symbol: llvm.x86.bmi.bzhi.32

which I didn't see listed as U in your example. LoopVectorization will try to use bmi.bzhi for creating masks when generating AVX512 code. It requires the bmi extension, which every CPU with AVX512 LV supports has. But generic targets don't.

So I've seen issues like this before from PackageCompiler. LoopVectorization only knows how to check host features, not target features.

Does compile_shlib target a generic architecture?

brenhinkeller commented 11 months ago

While it might be possible to change, compile_shlib generally targets the host architecture, specifically whatever Clang_jll targets by default

Ah, interesting..

chriselrod commented 11 months ago

While it might be possible to change, compile_shlib generally targets the host architecture, specifically whatever Clang_jll targets by default

Ah, interesting..

Targeting the host should be fine, as that's what LV assumes.

ArbitRandomUser commented 11 months ago

I just tried it and got

julia> ptr = Libdl.dlopen("./mult2.so", Libdl.RTLD_NOW)#error
ERROR: could not load library "./mult2.so"
./mult2.so: undefined symbol: llvm.x86.bmi.bzhi.32

which I didn't see listed as U in your example. LoopVectorization will try to use bmi.bzhi for creating masks when generating AVX512 code. It requires the bmi extension, which every CPU with AVX512 LV supports has. But generic targets don't.

So I've seen issues like this before from PackageCompiler. LoopVectorization only knows how to check host features, not target features.

Does compile_shlib target a generic architecture?

just tried on a cpu with avx512 and i get the same error, infact the .so includes llvm.assume along with the above symbol

[john@padmanabha3 staticjulia]$ nm mult2.so
0000000000202038 B __bss_start
0000000000202038 b completed.6355
                 w __cxa_finalize@@GLIBC_2.2.5
0000000000000650 t deregister_tm_clones
00000000000006c0 t __do_global_dtors_aux
0000000000201e00 t __do_global_dtors_aux_fini_array_entry
0000000000201e10 d __dso_handle
0000000000201e18 d _DYNAMIC
0000000000202038 D _edata
0000000000202040 B _end
00000000000011a4 T _fini
0000000000000700 t frame_dummy
0000000000201df8 t __frame_dummy_init_array_entry
0000000000001358 r __FRAME_END__
0000000000202000 d _GLOBAL_OFFSET_TABLE_
                 w __gmon_start__
00000000000012a0 r __GNU_EH_FRAME_HDR
00000000000005d0 T _init
                 w _ITM_deregisterTMCloneTable
                 w _ITM_registerTMCloneTable
0000000000201e08 d __JCR_END__
0000000000201e08 d __JCR_LIST__
                 w _Jv_RegisterClasses
                 U llvm.assume
                 U llvm.assume.renamed
                 U llvm.x86.bmi.bzhi.32
                 U llvm.x86.bmi.bzhi.32.renamed
0000000000000680 t register_tm_clones
0000000000000740 T test
0000000000000f50 T test2
0000000000202038 d __TMC_END__
chriselrod commented 11 months ago

Minimal example:

julia> using StaticCompiler, VectorizationBase, Libdl

julia> function test(n::Int64)::Float64
           VectorizationBase.assume(n>0)
           1/n
       end
test (generic function with 1 method)

julia> function test2(n::Int64)::Float64
           VectorizationBase.assume(n>0)
           2/n
       end
test2 (generic function with 1 method)

julia> compile_shlib([(test,(Int64,)),(test2,(Int64,))],filename="inv2",demangle=true)
"/home/chriselrod/Documents/progwork/cxx/LoopModels/inv2.so"

julia> ptr = Libdl.dlopen("./inv2.so", Libdl.RTLD_NOW)#error
ERROR: could not load library "./inv2.so"
./inv2.so: undefined symbol: llvm.assume
Stacktrace:
 [1] dlopen(s::String, flags::UInt32; throw_error::Bool)
   @ Base.Libc.Libdl ./libdl.jl:117
 [2] dlopen(s::String, flags::UInt32)
   @ Base.Libc.Libdl ./libdl.jl:116
 [3] top-level scope
   @ REPL[10]:1
brenhinkeller commented 11 months ago

Any idea why this seems to cause a problem here but not in the normal Julia compilation pathway? Maybe we need to add some cleaning passes or something?

chriselrod commented 11 months ago

No, but @pchintalapudi might.

pchintalapudi commented 11 months ago

Building with LLVM assertions, I get a different error:

julia> compile_shlib([(test,(Int64,)),(test2,(Int64,))],filename="inv2",demangle=true)
julia: /home/premc/julia/deps/srccache/llvm-julia-15.0.7-7/llvm/lib/IR/Function.cpp:1815: llvm::Optional<llvm::Function*> llvm::Intrinsic::remangleIntrinsicFunction(llvm::Function*): Assertion `NewDecl->getFunctionType() == F->getFunctionType() && "Shouldn't change the signature"' failed.

[907917] signal (6.-6): Aborted
in expression starting at REPL[4]:1
pthread_kill at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
raise at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x7f3b5dc9071a)
__assert_fail at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
remangleIntrinsicFunction at /home/premc/julia/deps/srccache/llvm-julia-15.0.7-7/llvm/lib/IR/Function.cpp:1815
linkGlobalValueProto at /home/premc/julia/deps/srccache/llvm-julia-15.0.7-7/llvm/lib/Linker/IRMover.cpp:1047 [inlined]
materialize at /home/premc/julia/deps/srccache/llvm-julia-15.0.7-7/llvm/lib/Linker/IRMover.cpp:594
mapValue at /home/premc/julia/deps/srccache/llvm-julia-15.0.7-7/llvm/lib/Transforms/Utils/ValueMapper.cpp:350
remapInstruction at /home/premc/julia/deps/srccache/llvm-julia-15.0.7-7/llvm/lib/Transforms/Utils/ValueMapper.cpp:917
remapFunction at /home/premc/julia/deps/srccache/llvm-julia-15.0.7-7/llvm/lib/Transforms/Utils/ValueMapper.cpp:1015
flush at /home/premc/julia/deps/srccache/llvm-julia-15.0.7-7/llvm/lib/Transforms/Utils/ValueMapper.cpp:899
~FlushingMapper at /home/premc/julia/deps/srccache/llvm-julia-15.0.7-7/llvm/lib/Transforms/Utils/ValueMapper.cpp:1135 [inlined]
mapValue at /home/premc/julia/deps/srccache/llvm-julia-15.0.7-7/llvm/lib/Transforms/Utils/ValueMapper.cpp:1160
run at /home/premc/julia/deps/srccache/llvm-julia-15.0.7-7/llvm/lib/Linker/IRMover.cpp:1595 [inlined]
move at /home/premc/julia/deps/srccache/llvm-julia-15.0.7-7/llvm/lib/Linker/IRMover.cpp:1748
run at /home/premc/julia/deps/srccache/llvm-julia-15.0.7-7/llvm/lib/Linker/LinkModules.cpp:581 [inlined]
linkInModule at /home/premc/julia/deps/srccache/llvm-julia-15.0.7-7/llvm/lib/Linker/LinkModules.cpp:603
linkModules at /home/premc/julia/deps/srccache/llvm-julia-15.0.7-7/llvm/lib/Linker/LinkModules.cpp:619
LLVMLinkModules2 at /home/premc/julia/deps/srccache/llvm-julia-15.0.7-7/llvm/lib/Linker/LinkModules.cpp:629
LLVMLinkModules2 at /home/premc/.julia/packages/LLVM/Od0DH/lib/15/libLLVM_h.jl:4926 [inlined]
link! at /home/premc/.julia/packages/LLVM/Od0DH/src/linker.jl:4 [inlined]
#native_llvm_module#47 at /home/premc/.julia/packages/StaticCompiler/LMT2M/src/StaticCompiler.jl:603
native_llvm_module at /home/premc/.julia/packages/StaticCompiler/LMT2M/src/StaticCompiler.jl:596 [inlined]
#generate_obj#51 at /home/premc/.julia/packages/StaticCompiler/LMT2M/src/StaticCompiler.jl:706
generate_obj at /home/premc/.julia/packages/StaticCompiler/LMT2M/src/StaticCompiler.jl:697 [inlined]
#generate_shlib#40 at /home/premc/.julia/packages/StaticCompiler/LMT2M/src/StaticCompiler.jl:559
generate_shlib at /home/premc/.julia/packages/StaticCompiler/LMT2M/src/StaticCompiler.jl:551 [inlined]
#compile_shlib#31 at /home/premc/.julia/packages/StaticCompiler/LMT2M/src/StaticCompiler.jl:352
compile_shlib at /home/premc/.julia/packages/StaticCompiler/LMT2M/src/StaticCompiler.jl:335
compile_shlib at /home/premc/.julia/packages/StaticCompiler/LMT2M/src/StaticCompiler.jl:335
unknown function (ip: 0x7f3b5ca85a9d)
_jl_invoke at /home/premc/julia/src/gf.c:2870 [inlined]
ijl_apply_generic at /home/premc/julia/src/gf.c:3071
jl_apply at /home/premc/julia/src/julia.h:1969 [inlined]
do_call at /home/premc/julia/src/interpreter.c:125
eval_value at /home/premc/julia/src/interpreter.c:222
eval_stmt_value at /home/premc/julia/src/interpreter.c:173 [inlined]
eval_body at /home/premc/julia/src/interpreter.c:620
jl_interpret_toplevel_thunk at /home/premc/julia/src/interpreter.c:774
jl_toplevel_eval_flex at /home/premc/julia/src/toplevel.c:934
jl_toplevel_eval_flex at /home/premc/julia/src/toplevel.c:877
ijl_toplevel_eval_in at /home/premc/julia/src/toplevel.c:985
eval at ./boot.jl:383 [inlined]
eval_user_input at /home/premc/julia/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:150
repl_backend_loop at /home/premc/julia/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:246
#start_repl_backend#46 at /home/premc/julia/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:231
start_repl_backend at /home/premc/julia/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:228
_jl_invoke at /home/premc/julia/src/gf.c:2870 [inlined]
ijl_apply_generic at /home/premc/julia/src/gf.c:3071
#run_repl#59 at /home/premc/julia/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:387
run_repl at /home/premc/julia/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:373
jfptr_run_repl_91909 at /home/premc/julia/usr/lib/julia/sys.so (unknown line)
_jl_invoke at /home/premc/julia/src/gf.c:2870 [inlined]
ijl_apply_generic at /home/premc/julia/src/gf.c:3071
#1037 at ./client.jl:432
jfptr_YY.1037_82773 at /home/premc/julia/usr/lib/julia/sys.so (unknown line)
_jl_invoke at /home/premc/julia/src/gf.c:2870 [inlined]
ijl_apply_generic at /home/premc/julia/src/gf.c:3071
jl_apply at /home/premc/julia/src/julia.h:1969 [inlined]
jl_f__call_latest at /home/premc/julia/src/builtins.c:812
#invokelatest#2 at ./essentials.jl:887 [inlined]
invokelatest at ./essentials.jl:884 [inlined]
run_main_repl at ./client.jl:416
exec_options at ./client.jl:333
_start at ./client.jl:552
jfptr__start_82799 at /home/premc/julia/usr/lib/julia/sys.so (unknown line)
_jl_invoke at /home/premc/julia/src/gf.c:2870 [inlined]
ijl_apply_generic at /home/premc/julia/src/gf.c:3071
jl_apply at /home/premc/julia/src/julia.h:1969 [inlined]
true_main at /home/premc/julia/src/jlapi.c:582
jl_repl_entrypoint at /home/premc/julia/src/jlapi.c:731
main at /home/premc/julia/cli/loader_exe.c:58
unknown function (ip: 0x7f3b5dc91d8f)
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
_start at ./julia (unknown line)
Allocations: 9273276 (Pool: 9256420; Big: 16856); GC: 6
Aborted

This indicates we're generating a malformed intrinsic, and indeed stepping through gdb we find a llvm.assume.1:

; ModuleID = 'start'
source_filename = "start"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-linux-gnu"

define double @test(i64 signext %"n::Int64") local_unnamed_addr #0 {
top:
  %0 = icmp sgt i64 %"n::Int64", 0
  call void @llvm.assume(i1 %0)
  %1 = sitofp i64 %"n::Int64" to double
  %2 = fdiv double 1.000000e+00, %1
  ret double %2
}

; Function Attrs: inaccessiblememonly nocallback nofree nosync nounwind willreturn
declare void @llvm.assume(i1 noundef %0) #1

define double @test2(i64 signext %"n::Int64") local_unnamed_addr #2 {
top:
  %0 = icmp sgt i64 %"n::Int64", 0
  call void @llvm.assume(i1 %0)
  %1 = sitofp i64 %"n::Int64" to double
  %2 = fdiv double 2.000000e+00, %1
  ret double %2
}

; Function Attrs: inaccessiblememonly nocallback nofree nosync nounwind willreturn
declare void @llvm.assume.1(i1 noundef %0) #3

attributes #0 = { "probe-stack"="inline-asm" }
attributes #1 = { inaccessiblememonly nocallback nofree nosync nounwind willreturn }
attributes #2 = { "probe-stack"="inline-asm" }
attributes #3 = { inaccessiblememonly nocallback nofree nosync nounwind willreturn }

!llvm.module.flags = !{!0, !1}

!0 = !{i32 2, !"Dwarf Version", i32 4}
!1 = !{i32 2, !"Debug Info Version", i32 3}

As for why LLVM claims it wants to replace that with a different intrinsic type, I'm not entirely sure.

pchintalapudi commented 11 months ago

Turns out it's not trying to replace with a different intrinsic type, it's that someone's trying to link two modules with different contexts together. Most likely this is an issue in StaticCompiler, where it doesn't ensure that the linked modules are in the same context?

(gdb) p NewDecl
$17 = <optimized out>
(gdb) p F->VTy->Context
$18 = (llvm::LLVMContext &) @0x555556541960: {pImpl = 0x55555818b450}
(gdb) p F->Parent->Context
$19 = (llvm::LLVMContext &) @0x555556dea9d0: {pImpl = 0x5555564cba80}
ArbitRandomUser commented 9 months ago

yes looks like https://github.com/tshort/StaticCompiler.jl/blob/f552ce0ea3642653daf11533b8f8fef1add33d58/src/StaticCompiler.jl#L602 is where the problem is at , the dispatch for the single function compiling activates and deactivates a new context . I dont know why this problem turns up with only LoopVectorization though

I could get it to work by making a context and then doing whatevr native_llvm_module(ff,t) does inside native_llvm_module(funcs) and then deactivating the context after the loop. i'll wait for you'll thoughts on the "right" way to implement this , would be more than happy to make a PR.

i was able to get it working with this https://github.com/tshort/StaticCompiler.jl/compare/master...ArbitRandomUser:StaticCompiler.jl:contextfix

tshort commented 9 months ago

Go ahead on the PR. At first glance, that fix looks reasonable to me.

ArbitRandomUser commented 9 months ago

i seem to be getting into errors with demangling in the tests . it seems no matter what (demangled=true or false) the names dont get mangled .

I'm a little new to using GPUCompiler , i'll look around , meanwhile if its apparent to you why ,let me know

ArbitRandomUser commented 9 months ago

managed to fix that issue , tests are passing as well