Open stevengj opened 7 years ago
I just saw that. The implementation here has an optimized three dimensional rule (I have also planned on adding an optimized two dimensional one). Would you be interested in migrating it or fuse both implementations over there?
What do you gain from the optimization? HCubature precomputes the rule, so the actual application of the rule should be pretty fast.
@stevengj Since you're talking about optimizations, on METADATA
repository you claimed that HCubature
's performance is a bit better than Cubature
, how did you measure that?
Using a very simple one-dimensional integration function in 1D (f(x) = x, in [0,1]) HCubature
is two orders of magnitudes slower than Cubature
:
julia> using Cubature, HCubature, BenchmarkTools, StaticArrays
julia> @benchmark HCubature.hcubature(t -> t[1], SVector(0.0), SVector(1.0))
BenchmarkTools.Trial:
memory estimate: 5.44 KiB
allocs estimate: 60
--------------
minimum time: 119.574 μs (0.00% GC)
median time: 136.770 μs (0.00% GC)
mean time: 143.156 μs (4.10% GC)
maximum time: 44.287 ms (99.62% GC)
--------------
samples: 10000
evals/sample: 1
julia> @benchmark Cubature.hcubature(t -> t[1], [0.0], [1.0])
BenchmarkTools.Trial:
memory estimate: 2.52 KiB
allocs estimate: 61
--------------
minimum time: 2.970 μs (0.00% GC)
median time: 3.071 μs (0.00% GC)
mean time: 4.061 μs (20.14% GC)
maximum time: 4.780 ms (99.84% GC)
--------------
samples: 10000
evals/sample: 9
Or maybe I'm calling HCubature.hcubature
in an non-efficient way?
The 1d case is not as well optimized (it calls QuadGK); and usually if you are doing 1d integration it is with a less trivial integrand. If you go to 2d, with a nontrivial integrand so that you aren't just benchmarking the overhead in initializing the quadrature, then it is slightly faster than Cubature:
julia> @benchmark HCubature.hcubature($(t -> sin(10t[1]*t[2])), $(SVector(0.,0.)), $(SVector(1.,1.)), rtol=1e-8)
BenchmarkTools.Trial:
memory estimate: 941.08 KiB
allocs estimate: 34938
--------------
minimum time: 1.372 ms (0.00% GC)
median time: 1.472 ms (0.00% GC)
mean time: 1.596 ms (4.92% GC)
maximum time: 4.261 ms (53.66% GC)
--------------
samples: 3128
evals/sample: 1
julia> @benchmark Cubature.hcubature($(t -> sin(10t[1]*t[2])), $(SVector(0.,0.)), $(SVector(1.,1.)), reltol=1e-8)
BenchmarkTools.Trial:
memory estimate: 1.93 MiB
allocs estimate: 42118
--------------
minimum time: 1.846 ms (0.00% GC)
median time: 1.981 ms (0.00% GC)
mean time: 2.155 ms (4.96% GC)
maximum time: 5.086 ms (34.67% GC)
--------------
samples: 2317
evals/sample: 1
(It will get even better once Julia can stack-allocate temporary struct
s; maintaining the heap of integration subregions is currently a bigger overhead than I would like.)
(I just spotted the problem with the 1d case: it is re-computing the Kronrod rule, which involves solving an eigenproblem, every time. I thought the QuadGK.kronrod
function cached the rule, but actually that is done elsewhere in QuadGK. I will update it to cache the rule and then I will try it again.)
Okay, the 1d case is now fixed too (or will be once I push a commit); thanks for spotting this!
julia> @btime HCubature.hcubature($(t -> t[1]), $(SVector(0.)), $(SVector(1.)), rtol=1e-8)
2.811 μs (36 allocations: 1008 bytes)
(0.5, 0.0)
julia> @btime Cubature.hcubature($(t -> t[1]), $(SVector(0.)), $(SVector(1.)), reltol=1e-8)
13.540 μs (82 allocations: 3.36 KiB)
(0.5, 5.551115123125783e-15)
Further speedups should be possible in the multidimensional case by caching the cubature rules there too, but I'm not sure it's worth it; normally I would expect multidimensional cubature to require enough function evaluations that the start-up cost of computing the rule doesn't matter?
For 2d integrals with a trivial integrand, I got a 40% speedup by caching the Genz–Malik rule. For more complex integrands that require adaptive refinement, the difference is negligible. But I guess integrating low-degree polynomials is pretty common (e.g. in FEM packages), and it only takes a few lines of code to cache the rule (for a given dimensionality and floating-point type), so I pushed that commit too.
(Whoops, I just noticed that I acknowledged @pabloferz rather than @giordano in the commit message for https://github.com/stevengj/HCubature.jl/commit/5e1b559ca17f2d63d940cb803888957efe26b4fa, sorry.)
The rule used here for 3 dimensions is not the same as the original Genz-Malik rule, it's an optimization on the rule in that it uses less function evaluations per region and its a 11th degree rule instead of a 9th degree one.
On the "An Adaptive Algorithm for the Approximate Calculation of Multiple Integrals" paper by Berntsen, Espelid, and Genz they mention that there's a 13th degree rule that can be used for 2-dimensional integrals and the 11th degree rule used here for 3d. For 4-dimensional or higher integrals I believe the only rule reported to date is the original Genz-Malik rule.
Additionally, I slightly modified the Berntsen-Espelid 11th rule, originally reported on [REF], so the computation of the error in the 3d case is a bit faster.
[REF] Jarle Berntsen and Terje O. Espelid: On the construction of higher degree three dimensional embedded integration rules. Reports in Informatics, University of Bergen, May 1985.
Another thing I noticed is that over HCubature
the original algorithm Genz-Malik its also used. Here I used the algorithm from "An Adaptive Algorithm for the Approximate Calculation of Multiple Integrals" (Its actually a one-to-one implementation of the algorithm described there). The main differences is that its supposed give better error estimates and has a better region subdivision scheme.
If you have a higher-degree rule that uses fewer function evaluations, that sounds great; a PR would be welcome (just put it into a separate file and define a new cubrule
method that returns it for 3d). Do you have a reference for it?
Regarding the 1991 Berntsen and Espelid paper you mention, from my understanding there are three main modifications compared to Genz and Malik. First, they handled vector-valued integrands, but HCubature handles this too (with a user-specifiable norm). Second, they have some advice about parallelization, but if I implement this I like the algorithm by Gladwell (1987) described here better. Third, they describe a "two-level" error-estimation scheme that supposedly makes it more robust, but I remember looking at this at one point and being a bit skeptical; I wrote in my notes that it substantially increased the number of function evaluations for some of my test integrands. But maybe I should revisit it, at least as an option.
Now I remember, the rule used here doesn't use fewer function evaluations (127) than the Genz-Malik one (77) for 3D integrals, so in principle for highly oscillating functions it could be faster. Regardless, I remember comparing the performance the implementation here against Cubature
(maybe that won't be the case with HCubature
) and found this (with the "two-level" error-estimation scheme) to be at least 2x faster even for small degree polynomials.
Regarding parallelization I also thought that their strategy wasn't the best one.
For low-degree polynomials the error-estimation scheme is irrelevant. For a low-degree polynomial (where one evaluation of the rule suffices) on my machine, HCubature is 5x faster than Cubature:
julia> @btime HCubature.hcubature($(t -> t[1]), $(SVector(0.,0.,0.)), $(SVector(1.,1.,1.)), rtol=1e-8)
2.979 μs (37 allocations: 1.45 KiB)
(0.49999999999999994, 5.551115123125783e-17)
julia> @btime Cubature.hcubature($(t -> t[1]), $(SVector(0.,0.,0.)), $(SVector(1.,1.,1.)), reltol=1e-8)
15.600 μs (132 allocations: 5.70 KiB)
(0.49999999999999994, 5.551115123125783e-17)
I just optimized the case where no subdivisions are required a bit further and it is now down to 1.914 μs, or 8x faster than Cubature.hcubature
for a trivial 3d integrand. (Haven't pushed that commit yet.)
(Note that in some sense the comparison to Cubature
is unfair. If you called the C library directly it would be faster — there is a price paid for marshalling/unmarshalling the arguments for the ccall
so that the Julia callback doesn't need to deal with raw pointers.)
I agree that for low-degree polynomials the error-estimation scheme is irrelevant.
For your example above, I get on my machine on 0.6 (this package is even faster on 0.5 and even more on master)
julia> let f = x -> x[1]
@btime HCubature.hcubature($f, $(SVector(0.0, 0.0, 0.0)), $(SVector(1.0, 1.0, 1.0)))
end
982.000 ns (16 allocations: 400 bytes)
(0.49999999999999994, 5.551115123125783e-17)
julia> let f = (x, y, z) -> x
@btime NIntegration.nintegrate($f, (0.0, 0.0, 0.0), (1.0, 1.0, 1.0))
end
605.363 ns (40 allocations: 912 bytes)
(0.5000000000000001, 7.412159432224628e-17, 127, 1 subregion)
Pretty close! For such a simple integrand with no adaptivity (single subregion) we are mostly measuring overhead in looking up and applying the rule, iterating over arrays, and things like that, and it wouldn't surprise me to get a 30% speedup by unrolling everything the way you can in 3d.jl.
In practice, I think it would be rare to have an integrand this cheap, so I'm skeptical that it's worthwhile to further optimize the rule-evaluation code.
(Even for an FEM code where people often integrate low-degree polynomials where the rule is exact, the integrand evaluation involves looking up things in mesh datastructures, coordinate transformations, etcetera.)
Performance-wise, the biggest annoyance for me at the moment is the number of allocations involved in managing/subdividing the region heap (in cases where lots of subdivisions are allowed), and I suspect you have similar issues? Hopefully this will go away once an immutable/struct like Region
(what I called "box") can be stack-allocated? I see that you tried to improve things by adding mutable fields to your Region
with a Box
type (why not just use RefValue
?).
I tried to cut down allocations as much as possible, but this is as far as I could get without stack allocated structs. I tried using a RefValue
instead of a mutable, but I think it was allocating more, so I left it like that (this may have changed in master).
https://github.com/stevengj/HCubature.jl implements the Genz-Malik algorithm in pure Julia for arbitrary dimensionalities.