Open peremato opened 2 years ago
The reason why @override
cannot take type parameters is that @override
have to provide enough information to create static dispatch. As for @virtual
, it may have a chance to support type parameters, but so far it uses the type signature to query "compatible" methods where generics are not allowed.
You can just avoid type parameters:
struct Dog{R} <: Animal end
@override fast_func(x::Dog{R} where R, y::Int) = 2 + y
# or just
@override fast_func(x::Dog, y::Int) = 2 + y
Note that non-concrete parameter types might cause performance loss.
If you want to use type parameters, consider the following pattern:
function fast_func(x::Animal, y::T) where T
fast_func(x, y, T)
end
@virtual function _fast_func(x::Animal, y, T)
...
end
Thanks for your quite answer. My use case is a bit more complex. I have a reduced example that I attaching here
using StaticArrays
using BenchmarkTools
using Virtual
const Point3 = SVector{3}
const Vector3 = SVector{3}
abstract type AbstractShape{T<:AbstractFloat} end
struct Box{T<:AbstractFloat} <: AbstractShape{T}
fDimensions::Vector3{T}
Box{T}(x,y,z) where T = new((x,y,z))
end
struct Tube{T<:AbstractFloat} <: AbstractShape{T}
rmin::T
rmax::T
z::T
Tube{T}(rmin, rmax, z) where T = new(rmin, rmax, z)
end
struct Cone{T<:AbstractFloat} <: AbstractShape{T}
r::T
z::T
Cone{T}(r, z) where T = new(r, z)
end
struct BooleanUnion{T<:AbstractFloat, SL<:AbstractShape{T}, SR<:AbstractShape{T}} <: AbstractShape{T}
left::SL
right::SR
end
BooleanUnion(left::AbstractShape{T}, right::AbstractShape{T}) where T = BooleanUnion{T,typeof(left),typeof(right)}(left, right)
@virtual fast_inside(s::AbstractShape{Float64}, p::Point3{Float64}) = error("No default method for inside!")
@virtual fast_inside(s::AbstractShape{Float32}, p::Point3{Float32}) = error("No default method for inside!")
@override fast_inside(box::Box{Float64}, point::Point3{Float64}) = inside(box, point)
@override fast_inside(box::Box{Float32}, point::Point3{Float32}) = inside(box, point)
@override fast_inside(tube::Tube{Float64}, point::Point3{Float64}) = inside(tube, point)
@override fast_inside(tube::Tube{Float32}, point::Point3{Float32}) = inside(tube, point)
@override fast_inside(cone::Cone{Float64}, point::Point3{Float64}) = inside(cone, point)
@override fast_inside(cone::Cone{Float32}, point::Point3{Float32}) = inside(cone, point)
@override fast_inside(union::BooleanUnion{Float64, SL, SR} where {SL,SR}, point::Point3{Float64}) = inside(union, point)
@override fast_inside(union::BooleanUnion{Float32, SL, SR} where {SL,SR}, point::Point3{Float32}) = inside(union, point)
#---The real algorithms with papametric methods (ideally I need to keep as they are)
inside(box::Box{T}, point::Point3{T}) where T = maximum(abs.(point) - box.fDimensions) < T(0)
inside(tube::Tube{T}, point::Point3{T}) where T = tube.rmin^2 < point[1]^2+point[2]^2 < tube.rmax^2 && abs(point[3]) < tube.z
inside(cone::Cone{T}, point::Point3{T}) where T = point[1]^2+point[2]^2 < ((cone.z-point[3])/2cone.z)^2 && abs(point[3]) < cone.z
inside(union::BooleanUnion{T,SL,SR}, point::Point3{T}) where {T,SL,SR} = inside(union.left, point) && inside(union.right, point)
box1 = Box{Float64}(1,2,3)
box2 = Box{Float64}(2,2,2)
tube = Tube{Float64}(1,2,3)
cone = Cone{Float64}(2,3)
const samples = (box1, box2, tube, cone, BooleanUnion(box1,tube), BooleanUnion(box2,tube), BooleanUnion(box1,box2) )
shapes = AbstractShape{Float64}[samples[rand(1:length(samples))] for i = 1:100]
const Shape{T} = Union{Box{T}, Tube{T}, Cone{T}, BooleanUnion{T}} where T<:AbstractFloat
ushapes = Shape{Float64}[samples[rand(1:length(samples))] for i = 1:100]
function inside_score(inside_func, point::Point3{T}, shapes::Vector{S}) where {T,S}
s = 0
for shape in shapes
inside_func(shape, point) && ( s += 1)
end
return s
end
point = Point3{Float64}(0.9,1,1)
@info "fast_inside via Virtual.jl"
@btime inside_score(fast_inside, point, shapes)
@info "original inside with Vector of AbstractShapes"
@btime inside_score(inside, point, shapes)
@info "original inside with Vector of union Shape"
@btime inside_score(inside, point, ushapes)
What I get running this code is:
[ Info: fast_inside via Virtual.jl
5.302 μs (100 allocations: 3.12 KiB)
[ Info: original inside with Vector of AbstractShapes
5.109 μs (100 allocations: 3.12 KiB)
[ Info: original inside with Vector of union Shape
19.647 μs (300 allocations: 12.78 KiB)
The Union Shape is what I have been using until now. But it explodes when there are more that 3 shapes. As you can see I do not manage to improve the performance with respect the dynamic dispatch. Any suggestion?
If I reduce the number of Shapes, in particular the BooleanUnion, I can see that using Union instead of AbstractShape outspeed the other cases.
const samples = (box1, box2, tube, cone ) #, BooleanUnion(box1,tube), BooleanUnion(box2,tube), BooleanUnion(box1,box2) )
const Shape{T} = Union{Box{T}, Tube{T}, Cone{T} #=, BooleanUnion{T}=# } where T<:AbstractFloat
produces
[ Info: fast_inside via Virtual.jl
3.216 μs (100 allocations: 3.12 KiB)
[ Info: original inside with Vector of AbstractShapes
3.279 μs (100 allocations: 3.12 KiB)
[ Info: original inside with Vector of union Shape
347.521 ns (0 allocations: 0 bytes)
Yes, in your case, we cannot specify exact parameters for virtual function calls, which causes dynamic call. This is a bug, and can be fixed in Virtual.jl. The mechanism itself can optimize your case:
using StaticArrays
using BenchmarkTools
using Virtual
const Point3 = SVector{3}
const Vector3 = SVector{3}
abstract type AbstractShape{T<:AbstractFloat} end
struct Box{T<:AbstractFloat} <: AbstractShape{T}
fDimensions::Vector3{T}
Box{T}(x,y,z) where T = new((x,y,z))
end
struct Tube{T<:AbstractFloat} <: AbstractShape{T}
rmin::T
rmax::T
z::T
Tube{T}(rmin, rmax, z) where T = new(rmin, rmax, z)
end
struct Cone{T<:AbstractFloat} <: AbstractShape{T}
r::T
z::T
Cone{T}(r, z) where T = new(r, z)
end
struct BooleanUnion{T<:AbstractFloat, SL<:AbstractShape{T}, SR<:AbstractShape{T}} <: AbstractShape{T}
left::SL
right::SR
end
BooleanUnion(left::AbstractShape{T}, right::AbstractShape{T}) where T = BooleanUnion{T,typeof(left),typeof(right)}(left, right)
@virtual fast_inside(s::AbstractShape, p::Point3) = error("No default method for inside!")
@override fast_inside(box::Box{Float64}, point::Point3{Float64}) = inside(box, point)
@override fast_inside(box::Box{Float32}, point::Point3{Float32}) = inside(box, point)
@override fast_inside(tube::Tube{Float64}, point::Point3{Float64}) = inside(tube, point)
@override fast_inside(tube::Tube{Float32}, point::Point3{Float32}) = inside(tube, point)
@override fast_inside(cone::Cone{Float64}, point::Point3{Float64}) = inside(cone, point)
@override fast_inside(cone::Cone{Float32}, point::Point3{Float32}) = inside(cone, point)
@inline function fast_inside_nodynamic(s::Any, p::Any)
sym_impl_func = Virtual.find_overrided_methods(Tuple{typeof(fast_inside), AbstractShape, Point3})
Virtual.apply_switch(
sym_impl_func, fast_inside,
Virtual.gen_wrap(AbstractShape, s),
Virtual.gen_wrap(Point3, p))
end
#---The real algorithms with papametric methods (ideally I need to keep as they are)
inside(box::Box{T}, point::Point3{T}) where T = maximum(abs.(point) - box.fDimensions) < T(0)
inside(tube::Tube{T}, point::Point3{T}) where T = tube.rmin^2 < point[1]^2+point[2]^2 < tube.rmax^2 && abs(point[3]) < tube.z
inside(cone::Cone{T}, point::Point3{T}) where T = point[1]^2+point[2]^2 < ((cone.z-point[3])/2cone.z)^2 && abs(point[3]) < cone.z
inside(union::BooleanUnion{T,SL,SR}, point::Point3{T}) where {T,SL,SR} = inside(union.left, point) && inside(union.right, point)
box1 = Box{Float64}(1,2,3)
box2 = Box{Float64}(2,2,2)
tube = Tube{Float64}(1,2,3)
cone = Cone{Float64}(2,3)
const samples = (box1, box2, tube, cone ) #, BooleanUnion(box1,tube), BooleanUnion(box2,tube), BooleanUnion(box1,box2) )
const Shape{T} = Union{Box{T}, Tube{T}, Cone{T} #=, BooleanUnion{T}=# } where T<:AbstractFloat
const shapes = AbstractShape{Float64}[samples[rand(1:length(samples))] for i = 1:100]
const ushapes = Shape{Float64}[samples[rand(1:length(samples))] for i = 1:100]
function inside_score(inside_func, point::Point3{T}, shapes::Vector{S}) where {T,S}
s = 0
for shape in shapes
inside_func(shape, point) && ( s += 1)
end
return s
end
point = Point3{Float64}(0.9,1,1)
@info "fast_inside via Virtual.jl"
@btime inside_score(fast_inside_nodynamic, point, shapes)
@info "original inside with Vector of AbstractShapes"
@btime inside_score(inside, point, shapes)
@info "original inside with Vector of union Shape"
@btime inside_score(inside, point, ushapes)
produces:
[ Info: fast_inside via Virtual.jl
278.912 ns (0 allocations: 0 bytes)
[ Info: original inside with Vector of AbstractShapes
3.138 μs (100 allocations: 3.12 KiB)
[ Info: original inside with Vector of union Shape
297.541 ns (0 allocations: 0 bytes)
This should also work when the union types are more than 3.
Thanks very much. This is getting more promising. I noticed that you have removed the two overwrites
@override fast_inside(union::BooleanUnion{Float64, SL, SR} where {SL,SR}, point::Point3{Float64}) = inside(union, point)
@override fast_inside(union::BooleanUnion{Float32, SL, SR} where {SL,SR}, point::Point3{Float32}) = inside(union, point)
If I put them back the performance improvement is not factors anymore.
[ Info: fast_inside via Virtual.jl
2.374 μs (0 allocations: 0 bytes)
[ Info: original inside with Vector of AbstractShapes
3.120 μs (100 allocations: 3.12 KiB)
[ Info: original inside with Vector of union Shape
282.230 ns (0 allocations: 0 bytes)
Putting back the BooleanUnion instances in the example I do not get any improvement, all the contrary
[ Info: fast_inside via Virtual.jl
22.212 μs (132 allocations: 6.88 KiB)
[ Info: original inside with Vector of AbstractShapes
4.819 μs (100 allocations: 3.12 KiB)
[ Info: original inside with Vector of union Shape
18.894 μs (300 allocations: 12.94 KiB)
I learn with the discussion that Union splitting is very critical on the way the container is iterated. Just changing the inside_score
function with
function inside_score(inside_func, point::Point3{T}, shapes::Vector{S}) where {T,S}
s = 0
for i in eachindex(shapes)
inside_func(shapes[i], point) && ( s += 1)
end
return s
end
I recover the good performance of Union
[ Info: fast_inside via Virtual.jl
16.641 μs (102 allocations: 5.31 KiB)
[ Info: original inside with Vector of AbstractShapes
4.778 μs (100 allocations: 3.12 KiB)
[ Info: original inside with Vector of union Shape
1.397 μs (42 allocations: 1.31 KiB)
However, I will probably hit a limit when exceeding a number of types in the Union.
That's true. Checking isa
for a complex generic type is too heavy, and such cases do not fit using Virtual.jl.
With your Aminal example
one gets the error:
LoadError: invalid function expression: (fast_func(x::Animal, y::T) where T) = begin
The same when overwriting with a Pametric type, for example:
one gets the error
LoadError: invalid function expression: (fast_func(x::Dog{R}, y::Int) where R) = begin