tshort / StaticCompiler.jl

Compiles Julia code to a standalone library (experimental)
Other
489 stars 31 forks source link

Incorrect return value of tuple with `compile_shlib` #101

Open baggepinnen opened 1 year ago

baggepinnen commented 1 year ago

The following code works as expected when calling compile, but returns the wrong result when using compile_shlib

using StaticArrays, LinearAlgebra, StaticCompiler

T = Float32

Base.@ccallable function controller1(xt::Tuple{Float32,Float32}, ut::Tuple{Float32})::Tuple{Float32,Float32}
    T = Float32
    A_ = @SMatrix T[1 1; 0 1]
    B_ = @SMatrix T[0; 1]
    x = SVector(xt)
    u = SVector(ut)
    xp = A_ * x #+ B_ * u
    xp.data
end

x = @SVector randn(T, 2) 
u = @SVector randn(T, 1) 

x′ = controller1(x.data, u.data) # test
@code_warntype controller1(x.data, u.data) # checks out

argtypes_controller1 = Tuple{typeof(x.data), typeof(u.data)}
controller1_compiled, path_controller1 = compile(controller1, argtypes_controller1, "controller1") 
x′ = controller1_compiled(x.data, u.data) # Works fine

path_controller1 = compile_shlib(controller1, argtypes_controller1, "controller1")

function c_step(x, u)
    Libc.Libdl.dlopen(path_controller1) do lib
        fn = Libc.Libdl.dlsym(lib, :julia_controller1)
        @ccall $(fn)(x::Tuple{Float32, Float32}, u::Tuple{Float32})::Tuple{Float32, Float32}
    end
end

x′ = c_step(x.data, u.data)
julia> x′ = controller1_compiled(x.data, u.data) # Works fine
(-0.22692525f0, -1.4554836f0)

julia> x′ = c_step(x.data, u.data)
(0.0f0, 0.0f0)

If I uncomment the rest of xp = A_ * x #+ B_ * u, it still works with compile, but I instead get a segfault withcompile_shlib`.


julia> versioninfo()
Julia Version 1.8.5
Commit 17cfb8e65ea (2023-01-08 06:45 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 24 × AMD Ryzen 9 5900X 12-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, znver3)
  Threads: 12 on 24 virtual cores
baggepinnen commented 1 year ago

A much smaller example

using LinearAlgebra, StaticCompiler

function controller3(xt::Tuple{Float64,Float64})::Tuple{Float64,Float64}
    xt
end

x = (randn(Float64, 2)...,)
x′ = controller3(x) # test
@code_warntype controller3(x) # checks out

argtypes_controller3 = Tuple{typeof(x)}
controller3_compiled, path_controller3 = compile(controller3, argtypes_controller3, "controller3") 
x′ = controller3_compiled(x) # Works fine

path_controller3 = compile_shlib(controller3, argtypes_controller3, "controller3")

function c_step(x)
    Libc.Libdl.dlopen(path_controller3) do lib
        fn = Libc.Libdl.dlsym(lib, :julia_controller3)
        ccall(fn, Tuple{Float64, Float64}, (Tuple{Float64, Float64}, ), x)
    end
end

x′ = c_step(x)
brenhinkeller commented 1 year ago

So as you may know, compile_shlib and compile_executable have quite a few more limitations than compile, because they don't link to libjulia.

Among other things, this means that while you can use types and dispatch as much as you want within your compiled function as long as everything's type-stable and inlined (since then your types all get compiled away), the same is not true if you try to return a Julia type from a function in a shlib. That shlib is just machine code, so has no awareness of Julia types, and while it may compile and return something if you tell it to return a Julia type, that something may not be what you expect.

Machine code of course cannot ever actually return a Julia type, only a native type (float, int/uint, bool, or pointer)! So if you want to compile something to native machine code and have it return an object of a Julia type (even something immutable, like a tuple), you'll have to figure out how Julia really does this under the hood. @ccall appears to be trying to do this for you, but evidently failing (I would guess due to hard-coding some pointer that is valid when compiled but not valid when later used).

The simplest way around this is to wrap all your Julia-typed objects in Refs (or anything else that lets you get a pointer to them) and pass around the pointers to both your inputs and your outputs as arguments -- for example: https://github.com/brenhinkeller/StaticTools.jl#compiled-sodylib-shared-libraries https://github.com/brenhinkeller/StaticTools.jl#calling-compiled-julia-library-from-julia

brenhinkeller commented 1 year ago

See also #100

baggepinnen commented 1 year ago

Yeah, I might have put too much hope into ccall understanding how to convert my tuple :/ Thanks for clarifying! What do you think it would take to "teach" ccall that an NTuple{T} maps to a C-array with the corresponding C version of T, like NTuple{2, Float64} => double var[2]?

baggepinnen commented 1 year ago

And related, would it be possible to detect that the user is making such an error (using a Julia type) and throw a helpful error message?

brenhinkeller commented 1 year ago

So my guess is that ccall understands the memory layout of the tuple, but is looking for it in the wrong place...

As was part of the issue in #100, Julia often needs a place to put things when a function returns. In that case, this was causing calls to the GC to be added, but I suspect the exact same underlying problem exists here -- except in this case I suspect Julia is solving it differently (because there are no errors about missing "gc" / "alloc" functions). What I suspect is happening instead is that the Julia compiler is actually inserting a hard-coded pointer rather than inserting a call to the GC.

One way to check may be looking at the @code_llvm output for the function in question and looking for a hard-coded memory address. These are actually pretty common in Julia code; quite a number of Julia functions will compile to LLVM IR that simply hard-codes a pointer location. And as long as you're within the same Julia session, this could be a very efficient way of telling the code exactly where to look for something. However, as soon as you quit the Julia session where you did the compilation (or possibly even before then, if the memory in question gets GC'd!), that memory location will be invalid and you'll get wrong results and/or segfaults.

An error message for this is a great idea -- I'll try adding it as a warning for now so folks can still play around https://github.com/tshort/StaticCompiler.jl/pull/102

jpsamaroo commented 1 year ago

Looking at @code_llvm, returning a Tuple uses the sret calling convention, which means that the first argument to the function is a slot allocated on the stack that the result will be stored to (and then the function just "returns" nothing). If you tell ccall that you're returning a Tuple, it will probably assume an sret calling convention, but I'm not sure of that.