Open daniellivingston opened 5 years ago
@cwgable modified the intrp_gtg
subroutine to call a new subroutine, nearestpoint1
, which does not allocate an array on call, but requires an array of length mtri
to be allocated prior to calling.
These changes are on branch new-interpolate
.
Tertiary testing shows a dramatic speed improvement:
NP | Runtime (new interpolate) (s) | Runtime (old interpolate) (s) |
---|---|---|
11 | 0.05824398994445801 | 0.07340097427368164 |
21 | 0.08977389335632324 | 0.12838387489318848 |
31 | 0.1591808795928955 | 0.5874381065368652 |
41 | 0.3121931552886963 | 2.2899179458618164 |
51 | 0.545767068862915 | 7.575886964797974 |
61 | 0.8833439350128174 | 19.65384817123413 |
71 | 1.4143941402435303 | 46.490236043930054 |
81 | 2.0786960124969482 | 101.2328200340271 |
91 | 3.044241189956665 | 220.3300280570984 |
where number_of_elements = NP**3
.
I am seeing two other subroutines that call nearestpoint0
, which may or may not need to be changed:
~/playground/LaGriT-new-interpolate/src new-interpolate*
❯ ag "call nearestpoint0"
dopmat.f
1308: call nearestpoint0(xq,yq,zq,xs,ys,zs,linkt,sbox,eps,
upscale.f
904: call nearestpoint0(xq,yq,zq,xs,ys,zs,linkt,sbox,eps,
As of 7ecf4e9978afbcc7778db376171c204b3b25d111, nearestpoint0
has been fully deprecated in favor of nearestpoint1
.
~/playground/LaGriT-new-interpolate new-interpolate*
❯ ag "call nearestpoint0" src/*.f*
~/playground/LaGriT-new-interpolate new-interpolate*
❯ ag "call nearestpoint1" src/*.f*
src/dopmat.f
1312: call nearestpoint1(xq,yq,zq,xs,ys,zs,linkt,sbox,eps,
src/intrp_gtg.f
1292: call nearestpoint1(xp,yp,zp,xs,ys,zs,linkt,sbox,eps,
src/upscale.f
911: call nearestpoint1(xq,yq,zq,xs,ys,zs,linkt,sbox,eps,
As of 6df48edc5ef319c8b9fbc6ca6a9ec75c5aaf9610:
nearestpoint0
was deleted and nearestpoint1
was renamed nearestpoint0
distpossleaf
are released at the end of the do-loops enclosing call nearestpoint0
.level01
test suite passes
Subroutine
intrp_gtg
for the Voronoi case runs at O(n^2).@cwgable has found that the issue lies in the
nearestpoint0
subroutine, whichintrp_gtg
calls for each element.nearestpoint0
allocates and deallocates, for each call, areal*8
array of lengthmtri
:This explains the callgrind profiling, where we've seen
mmgetblk
dominate runtime for largenelem
meshes: