jmert commented 4 years ago

This originally started as a couple of simple changes to realign the code with the mathematical documentation, but as I explored the typed/LLVM IR, I found various ways to improve it for higher performance.

The major improvements are:

Deduplicate the loop over ms so that the loop over l isn't written out twice. This vastly decreases the generated code length.
Switch from broadcast operations to explicit loops. We know everything about the shape of all the interacting arrays, so the overhead in broadcasting's shape checks can be eliminated, giving large speed increases.

Using the new benchmark script from (#11), the comparison to master gives:

"LegendreUnitNorm" => 12-element BenchmarkTools.BenchmarkGroup:
        tags: []
        "(\"outdim0\", \"indim0\")" => TrialJudgement(-20.61% => improvement)
        "(\"outdim0\", \"indim1\")" => TrialJudgement(-51.90% => improvement)
        "(\"outdim0\", \"indim2\")" => TrialJudgement(-52.13% => improvement)
        "(\"outdim0\", \"inscalar\")" => TrialJudgement(-4.69% => invariant)
        "(\"outdim1\", \"indim0\")" => TrialJudgement(-54.08% => improvement)
        "(\"outdim1\", \"indim1\")" => TrialJudgement(-67.33% => improvement)
        "(\"outdim1\", \"indim2\")" => TrialJudgement(-67.66% => improvement)
        "(\"outdim1\", \"inscalar\")" => TrialJudgement(-53.54% => improvement)
        "(\"outdim2\", \"indim0\")" => TrialJudgement(-65.21% => improvement)
        "(\"outdim2\", \"indim1\")" => TrialJudgement(-50.80% => improvement)
        "(\"outdim2\", \"indim2\")" => TrialJudgement(-49.74% => improvement)
        "(\"outdim2\", \"inscalar\")" => TrialJudgement(-63.43% => improvement)
"LegendreNormCoeff{LegendreSphereNorm,Float64}" => 12-element BenchmarkTools.BenchmarkGroup:
        tags: []
        "(\"outdim0\", \"indim0\")" => TrialJudgement(-24.83% => improvement)
        "(\"outdim0\", \"indim1\")" => TrialJudgement(-54.69% => improvement)
        "(\"outdim0\", \"indim2\")" => TrialJudgement(-53.74% => improvement)
        "(\"outdim0\", \"inscalar\")" => TrialJudgement(-1.40% => invariant)
        "(\"outdim1\", \"indim0\")" => TrialJudgement(-55.08% => improvement)
        "(\"outdim1\", \"indim1\")" => TrialJudgement(-67.62% => improvement)
        "(\"outdim1\", \"indim2\")" => TrialJudgement(-67.17% => improvement)
        "(\"outdim1\", \"inscalar\")" => TrialJudgement(-54.04% => improvement)
        "(\"outdim2\", \"indim0\")" => TrialJudgement(-64.09% => improvement)
        "(\"outdim2\", \"indim1\")" => TrialJudgement(-49.34% => improvement)
        "(\"outdim2\", \"indim2\")" => TrialJudgement(-50.86% => improvement)
        "(\"outdim2\", \"inscalar\")" => TrialJudgement(-66.42% => improvement)
"LegendreSphereNorm" => 12-element BenchmarkTools.BenchmarkGroup:
        tags: []
        "(\"outdim0\", \"indim0\")" => TrialJudgement(-17.59% => improvement)
        "(\"outdim0\", \"indim1\")" => TrialJudgement(-54.17% => improvement)
        "(\"outdim0\", \"indim2\")" => TrialJudgement(-50.57% => improvement)
        "(\"outdim0\", \"inscalar\")" => TrialJudgement(-0.29% => invariant)
        "(\"outdim1\", \"indim0\")" => TrialJudgement(-53.14% => improvement)
        "(\"outdim1\", \"indim1\")" => TrialJudgement(-65.32% => improvement)
        "(\"outdim1\", \"indim2\")" => TrialJudgement(-65.75% => improvement)
        "(\"outdim1\", \"inscalar\")" => TrialJudgement(-48.89% => improvement)
        "(\"outdim2\", \"indim0\")" => TrialJudgement(-60.70% => improvement)
        "(\"outdim2\", \"indim1\")" => TrialJudgement(-52.72% => improvement)
        "(\"outdim2\", \"indim2\")" => TrialJudgement(-52.26% => improvement)
        "(\"outdim2\", \"inscalar\")" => TrialJudgement(-62.06% => improvement)
"LegendreNormCoeff{LegendreUnitNorm,Float64}" => 12-element BenchmarkTools.BenchmarkGroup:
        tags: []
        "(\"outdim0\", \"indim0\")" => TrialJudgement(-25.32% => improvement)
        "(\"outdim0\", \"indim1\")" => TrialJudgement(-52.12% => improvement)
        "(\"outdim0\", \"indim2\")" => TrialJudgement(-51.25% => improvement)
        "(\"outdim0\", \"inscalar\")" => TrialJudgement(-1.45% => invariant)
        "(\"outdim1\", \"indim0\")" => TrialJudgement(-54.67% => improvement)
        "(\"outdim1\", \"indim1\")" => TrialJudgement(-66.70% => improvement)
        "(\"outdim1\", \"indim2\")" => TrialJudgement(-66.90% => improvement)
        "(\"outdim1\", \"inscalar\")" => TrialJudgement(-54.17% => improvement)
        "(\"outdim2\", \"indim0\")" => TrialJudgement(-63.76% => improvement)
        "(\"outdim2\", \"indim1\")" => TrialJudgement(-49.75% => improvement)
        "(\"outdim2\", \"indim2\")" => TrialJudgement(-50.63% => improvement)
        "(\"outdim2\", \"inscalar\")" => TrialJudgement(-72.23% => improvement)

Every case improves in performance (except the scalar-input to scalar-output case) by roughly 2–4×.

codecov[bot] commented 4 years ago

Codecov Report

Merging #12 into master will decrease coverage by 1.43%. The diff coverage is 97.87%.

@@            Coverage Diff             @@
##           master      #12      +/-   ##
==========================================
- Coverage   96.91%   95.47%   -1.44%     
==========================================
  Files           9        9              
  Lines         259      243      -16     
==========================================
- Hits          251      232      -19     
- Misses          8       11       +3

Impacted Files	Coverage Δ
src/calculation.jl	`95.65% <97.87%> (-0.54%)`	:arrow_down:
src/scalar.jl	`83.33% <0.00%> (-16.67%)`	:arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 9d910fc...944c0d9. Read the comment docs.

jmert commented 4 years ago

The drop in coverage seems to be due to Julia's coverage tracking. If I run the coverage locally with julia --inline=no, everything is still tested.

jmert / AssociatedLegendrePolynomials.jl

Rewrite core computation to match docs and for better performance #12

Codecov Report