thautwarm / Virtual.jl

MIT License
41 stars 1 forks source link

Any chance to support Parametric Methods? #4

Open peremato opened 2 years ago

peremato commented 2 years ago

With your Aminal example

@virtual fast_func(x::Animal, y::T) where T = error("No default method for score!")

one gets the error: LoadError: invalid function expression: (fast_func(x::Animal, y::T) where T) = begin

The same when overwriting with a Pametric type, for example:

struct Dog{R} <: Animal end
@override fast_func(x::Dog{R}, y::Int) where R  = 2 + y

one gets the error LoadError: invalid function expression: (fast_func(x::Dog{R}, y::Int) where R) = begin

thautwarm commented 2 years ago

The reason why @override cannot take type parameters is that @override have to provide enough information to create static dispatch. As for @virtual, it may have a chance to support type parameters, but so far it uses the type signature to query "compatible" methods where generics are not allowed.

You can just avoid type parameters:

struct Dog{R} <: Animal end
@override fast_func(x::Dog{R} where R, y::Int)  = 2 + y
# or just 
@override fast_func(x::Dog, y::Int)  = 2 + y

Note that non-concrete parameter types might cause performance loss.

If you want to use type parameters, consider the following pattern:

function fast_func(x::Animal, y::T) where T 
     fast_func(x, y, T)
end

@virtual function _fast_func(x::Animal, y, T)
     ...
end
peremato commented 2 years ago

Thanks for your quite answer. My use case is a bit more complex. I have a reduced example that I attaching here

using StaticArrays
using BenchmarkTools
using Virtual

const Point3 = SVector{3}
const Vector3 = SVector{3}

abstract type AbstractShape{T<:AbstractFloat} end

struct Box{T<:AbstractFloat} <: AbstractShape{T}
    fDimensions::Vector3{T}
    Box{T}(x,y,z) where T = new((x,y,z))
end
struct Tube{T<:AbstractFloat} <: AbstractShape{T}
    rmin::T
    rmax::T
    z::T
    Tube{T}(rmin, rmax, z) where T = new(rmin, rmax, z)
end
struct Cone{T<:AbstractFloat} <: AbstractShape{T}
    r::T
    z::T
    Cone{T}(r, z) where T = new(r, z)
end
struct BooleanUnion{T<:AbstractFloat, SL<:AbstractShape{T}, SR<:AbstractShape{T}} <: AbstractShape{T}
    left::SL 
    right::SR
end
BooleanUnion(left::AbstractShape{T}, right::AbstractShape{T}) where T = BooleanUnion{T,typeof(left),typeof(right)}(left, right)

@virtual fast_inside(s::AbstractShape{Float64}, p::Point3{Float64}) = error("No default method for inside!")
@virtual fast_inside(s::AbstractShape{Float32}, p::Point3{Float32}) = error("No default method for inside!")

@override fast_inside(box::Box{Float64}, point::Point3{Float64}) = inside(box, point)
@override fast_inside(box::Box{Float32}, point::Point3{Float32}) = inside(box, point)
@override fast_inside(tube::Tube{Float64}, point::Point3{Float64}) = inside(tube, point)
@override fast_inside(tube::Tube{Float32}, point::Point3{Float32}) = inside(tube, point)
@override fast_inside(cone::Cone{Float64}, point::Point3{Float64}) = inside(cone, point)
@override fast_inside(cone::Cone{Float32}, point::Point3{Float32}) = inside(cone, point)
@override fast_inside(union::BooleanUnion{Float64, SL, SR} where {SL,SR}, point::Point3{Float64}) = inside(union, point)
@override fast_inside(union::BooleanUnion{Float32, SL, SR} where {SL,SR}, point::Point3{Float32}) = inside(union, point)

#---The real algorithms with papametric methods (ideally I need to keep as they are)
inside(box::Box{T}, point::Point3{T}) where T = maximum(abs.(point) - box.fDimensions) < T(0)
inside(tube::Tube{T}, point::Point3{T}) where T = tube.rmin^2 < point[1]^2+point[2]^2 < tube.rmax^2  && abs(point[3]) < tube.z
inside(cone::Cone{T}, point::Point3{T}) where T = point[1]^2+point[2]^2 <  ((cone.z-point[3])/2cone.z)^2 && abs(point[3]) < cone.z
inside(union::BooleanUnion{T,SL,SR}, point::Point3{T}) where {T,SL,SR} = inside(union.left, point) && inside(union.right, point)

box1 = Box{Float64}(1,2,3)
box2 = Box{Float64}(2,2,2)
tube = Tube{Float64}(1,2,3)
cone = Cone{Float64}(2,3)

const samples = (box1, box2, tube, cone, BooleanUnion(box1,tube), BooleanUnion(box2,tube), BooleanUnion(box1,box2) )
shapes = AbstractShape{Float64}[samples[rand(1:length(samples))] for i = 1:100]

const Shape{T} = Union{Box{T}, Tube{T}, Cone{T}, BooleanUnion{T}} where T<:AbstractFloat
ushapes = Shape{Float64}[samples[rand(1:length(samples))] for i = 1:100]

function inside_score(inside_func, point::Point3{T}, shapes::Vector{S}) where {T,S}
    s = 0
    for shape in shapes
        inside_func(shape, point) && ( s += 1)
    end
    return s
end

point = Point3{Float64}(0.9,1,1)
@info "fast_inside via Virtual.jl"
@btime inside_score(fast_inside, point, shapes)
@info "original inside with Vector of AbstractShapes"
@btime inside_score(inside, point, shapes)
@info "original inside with Vector of union Shape"
@btime inside_score(inside, point, ushapes)

What I get running this code is:

[ Info: fast_inside via Virtual.jl
  5.302 μs (100 allocations: 3.12 KiB)
[ Info: original inside with Vector of AbstractShapes
  5.109 μs (100 allocations: 3.12 KiB)
[ Info: original inside with Vector of union Shape
  19.647 μs (300 allocations: 12.78 KiB)

The Union Shape is what I have been using until now. But it explodes when there are more that 3 shapes. As you can see I do not manage to improve the performance with respect the dynamic dispatch. Any suggestion?

peremato commented 2 years ago

If I reduce the number of Shapes, in particular the BooleanUnion, I can see that using Union instead of AbstractShape outspeed the other cases.

const samples = (box1, box2, tube, cone ) #, BooleanUnion(box1,tube), BooleanUnion(box2,tube), BooleanUnion(box1,box2) )
const Shape{T} = Union{Box{T}, Tube{T}, Cone{T} #=, BooleanUnion{T}=# } where T<:AbstractFloat

produces

[ Info: fast_inside via Virtual.jl
  3.216 μs (100 allocations: 3.12 KiB)
[ Info: original inside with Vector of AbstractShapes
  3.279 μs (100 allocations: 3.12 KiB)
[ Info: original inside with Vector of union Shape
  347.521 ns (0 allocations: 0 bytes)
thautwarm commented 2 years ago

Yes, in your case, we cannot specify exact parameters for virtual function calls, which causes dynamic call. This is a bug, and can be fixed in Virtual.jl. The mechanism itself can optimize your case:

using StaticArrays
using BenchmarkTools
using Virtual

const Point3 = SVector{3}
const Vector3 = SVector{3}

abstract type AbstractShape{T<:AbstractFloat} end

struct Box{T<:AbstractFloat} <: AbstractShape{T}
    fDimensions::Vector3{T}
    Box{T}(x,y,z) where T = new((x,y,z))
end
struct Tube{T<:AbstractFloat} <: AbstractShape{T}
    rmin::T
    rmax::T
    z::T
    Tube{T}(rmin, rmax, z) where T = new(rmin, rmax, z)
end
struct Cone{T<:AbstractFloat} <: AbstractShape{T}
    r::T
    z::T
    Cone{T}(r, z) where T = new(r, z)
end
struct BooleanUnion{T<:AbstractFloat, SL<:AbstractShape{T}, SR<:AbstractShape{T}} <: AbstractShape{T}
    left::SL 
    right::SR
end
BooleanUnion(left::AbstractShape{T}, right::AbstractShape{T}) where T = BooleanUnion{T,typeof(left),typeof(right)}(left, right)

@virtual fast_inside(s::AbstractShape, p::Point3) = error("No default method for inside!")

@override fast_inside(box::Box{Float64}, point::Point3{Float64}) = inside(box, point)
@override fast_inside(box::Box{Float32}, point::Point3{Float32}) = inside(box, point)
@override fast_inside(tube::Tube{Float64}, point::Point3{Float64}) = inside(tube, point)
@override fast_inside(tube::Tube{Float32}, point::Point3{Float32}) = inside(tube, point)
@override fast_inside(cone::Cone{Float64}, point::Point3{Float64}) = inside(cone, point)
@override fast_inside(cone::Cone{Float32}, point::Point3{Float32}) = inside(cone, point)

@inline function fast_inside_nodynamic(s::Any, p::Any)
    sym_impl_func = Virtual.find_overrided_methods(Tuple{typeof(fast_inside), AbstractShape, Point3})
    Virtual.apply_switch(
        sym_impl_func, fast_inside, 
            Virtual.gen_wrap(AbstractShape, s),
            Virtual.gen_wrap(Point3, p))
end

#---The real algorithms with papametric methods (ideally I need to keep as they are)
inside(box::Box{T}, point::Point3{T}) where T = maximum(abs.(point) - box.fDimensions) < T(0)
inside(tube::Tube{T}, point::Point3{T}) where T = tube.rmin^2 < point[1]^2+point[2]^2 < tube.rmax^2  && abs(point[3]) < tube.z
inside(cone::Cone{T}, point::Point3{T}) where T = point[1]^2+point[2]^2 <  ((cone.z-point[3])/2cone.z)^2 && abs(point[3]) < cone.z
inside(union::BooleanUnion{T,SL,SR}, point::Point3{T}) where {T,SL,SR} = inside(union.left, point) && inside(union.right, point)

box1 = Box{Float64}(1,2,3)
box2 = Box{Float64}(2,2,2)
tube = Tube{Float64}(1,2,3)
cone = Cone{Float64}(2,3)

const samples = (box1, box2, tube, cone ) #, BooleanUnion(box1,tube), BooleanUnion(box2,tube), BooleanUnion(box1,box2) )
const Shape{T} = Union{Box{T}, Tube{T}, Cone{T} #=, BooleanUnion{T}=# } where T<:AbstractFloat
const shapes = AbstractShape{Float64}[samples[rand(1:length(samples))] for i = 1:100]
const ushapes = Shape{Float64}[samples[rand(1:length(samples))] for i = 1:100]

function inside_score(inside_func, point::Point3{T}, shapes::Vector{S}) where {T,S}
    s = 0
    for shape in shapes
        inside_func(shape, point) && ( s += 1)
    end
    return s
end

point = Point3{Float64}(0.9,1,1)

@info "fast_inside via Virtual.jl"
@btime inside_score(fast_inside_nodynamic, point, shapes)
@info "original inside with Vector of AbstractShapes"
@btime inside_score(inside, point, shapes)
@info "original inside with Vector of union Shape"
@btime inside_score(inside, point, ushapes)

produces:

[ Info: fast_inside via Virtual.jl
  278.912 ns (0 allocations: 0 bytes)
[ Info: original inside with Vector of AbstractShapes
  3.138 μs (100 allocations: 3.12 KiB)
[ Info: original inside with Vector of union Shape
  297.541 ns (0 allocations: 0 bytes)

This should also work when the union types are more than 3.

peremato commented 2 years ago

Thanks very much. This is getting more promising. I noticed that you have removed the two overwrites

@override fast_inside(union::BooleanUnion{Float64, SL, SR} where {SL,SR}, point::Point3{Float64}) = inside(union, point)
@override fast_inside(union::BooleanUnion{Float32, SL, SR} where {SL,SR}, point::Point3{Float32}) = inside(union, point)

If I put them back the performance improvement is not factors anymore.

[ Info: fast_inside via Virtual.jl
  2.374 μs (0 allocations: 0 bytes)
[ Info: original inside with Vector of AbstractShapes
  3.120 μs (100 allocations: 3.12 KiB)
[ Info: original inside with Vector of union Shape
  282.230 ns (0 allocations: 0 bytes)

Putting back the BooleanUnion instances in the example I do not get any improvement, all the contrary

[ Info: fast_inside via Virtual.jl
  22.212 μs (132 allocations: 6.88 KiB)
[ Info: original inside with Vector of AbstractShapes
  4.819 μs (100 allocations: 3.12 KiB)
[ Info: original inside with Vector of union Shape
  18.894 μs (300 allocations: 12.94 KiB)

I learn with the discussion that Union splitting is very critical on the way the container is iterated. Just changing the inside_score function with

function inside_score(inside_func, point::Point3{T}, shapes::Vector{S}) where {T,S}
    s = 0
    for i in eachindex(shapes)
        inside_func(shapes[i], point) && ( s += 1)
    end
    return s
end

I recover the good performance of Union

[ Info: fast_inside via Virtual.jl
  16.641 μs (102 allocations: 5.31 KiB)
[ Info: original inside with Vector of AbstractShapes
  4.778 μs (100 allocations: 3.12 KiB)
[ Info: original inside with Vector of union Shape
  1.397 μs (42 allocations: 1.31 KiB)

However, I will probably hit a limit when exceeding a number of types in the Union.

thautwarm commented 2 years ago

That's true. Checking isa for a complex generic type is too heavy, and such cases do not fit using Virtual.jl.