Closed johnomotani closed 4 weeks ago
Wow, a 3x speed up would be impressive!
What I don't understand, and what I am wondering, is how did you know that the "for loop" version
function lagrange_poly(j,x_nodes,x)
# get number of nodes
n = size(x_nodes,1)
# location where l(x0) = 1
x0 = x_nodes[j]
# evaluate polynomial
poly = 1.0
for i in 1:j-1
poly *= (x - x_nodes[i])/(x0 - x_nodes[i])
end
for i in j+1:n
poly *= (x - x_nodes[i])/(x0 - x_nodes[i])
end
return poly
end
would be slower than the version using Julia base functions
function lagrange_poly_optimised(other_nodes, one_over_denominator, x)
return prod(x - n for n ∈ other_nodes) * one_over_denominator
end
In my previous life as a Fortran developer, I would have expected that there would be negligible difference between to two methods for a compiled language -- what am I missing?
Some timing checks would be helpful here to justify the changes (from the comments it looks like you are planning to do this anyway). If this method was generalised to an N-dimensional case I would be concerned that the memory requirement for storing the "other nodes" for every node might become an issue.
@mrhardman I don't think it was other changes that made the speedup. I didn't test separately, but the significant things are:
I did also try a for-loop version of lagrange_poly_optimised()
, which was essentially identical timing to the one using prod
. prod
was marginally faster (although the difference could have just been noise), and is more compact to write, so I chose to use that version.
I still think we'll want to do N 1d interpolations rather than one N-dimensional interpolation (although that has its own memory cost for at least one buffer array...), so I'm only anticipating 1d versions of "other nodes". It's true if we wanted an N-dimensional version, the memory usage for "other nodes" could become a concern!
Some timing checks would be helpful here to justify the changes
I did the one test on a case I happened to be running anyway. I was using the profiler (StatProfilerHTML) to measure how much time was spent in different parts of the code, so it's not too convenient to post the results. The speedup (factor of a few) was in line with what I'd expect, so I wasn't planning to do any more timing for this PR.
I did suggest a replacement for interpolate_2D_vspace!()
that I think should be faster, but left it commented out because it would need testing for both correctness and performance, and it needs an additional pdf-sized buffer passing as an argument - I wasn't planning to include that change in this PR unless someone else has time to do the testing.
I did suggest a replacement for
interpolate_2D_vspace!()
that I think should be faster, but left it commented out because it would need testing for both correctness and performance, and it needs an additional pdf-sized buffer passing as an argument - I wasn't planning to include that change in this PR unless someone else has time to do the testing.
There is already an existing CI test for the `interpolate_2D_vspace!() function in the Fokker-Planck tests that you should be able to quickly modify if you want to include this here. You would be able to add a logical flag to the list of tests made a this line https://github.com/mabarnes/moment_kinetics/blob/73f8bf5a99dc1f07272f758208543caf3fac41f3/moment_kinetics/test/fokker_planck_tests.jl#L75, and then switch between my original interpolation and your optimised version at these lines https://github.com/mabarnes/moment_kinetics/blob/73f8bf5a99dc1f07272f758208543caf3fac41f3/moment_kinetics/test/fokker_planck_tests.jl#L121 and https://github.com/mabarnes/moment_kinetics/blob/73f8bf5a99dc1f07272f758208543caf3fac41f3/moment_kinetics/test/fokker_planck_tests.jl#L125.
Hope this helps.
Precalculate some quantities to minimise the amount of work needed to evaluate the Lagrange polynomials, and hopefully to make the loops vectorise better. Speeds up interpolation by about 3x in one case that I profiled.