Use a no-inline inner function

mbauman commented 9 years ago

This goes through the passed expression, determines the bindings which were marked as non-constant, and makes those the arguments to the function. This prevents LLVM from doing constant folding within the inner function.
Mark the inner function as at-noinline. This prevents LLVM from constant-folding the benchmarking loop.
Within the outer benchmarking function, copy these non-constant bindings to local variables with type assertions. This ensures that Julia can pre-compute the method lookup to the inner function, avoiding dynamic method resolution.

Positive effects:

Benchmarks are more consistent across all function definitions types (vararg splats, inlined vs. not-inlined, etc).
LLVM no longer constant-folds simple benchmarks, fixing #5.

Negative effects:

Minimum benchmark time is constrained to that of a function call.

Demo. Minimum time is function call:

julia> @benchmark 1
================ Benchmark Results ========================
   Average elapsed time: 3.25 ns
     95% CI for average: [3.18 ns, 3.32 ns]
   Minimum elapsed time: 8.64 ns
                GC time: 0.00%
       Memory allocated: 0 bytes
  Number of allocations: 0 allocations
      Number of samples: 3401
        R² of OLS model: 0.959
Time used for benchmark: 0.10s
            Precompiled: true
       Multiple samples: true
       Search performed: true

And the pathological cases over at #16 are fixed:

julia> A = zeros(10,10);
julia> const B = zeros(10,10);

julia> @benchmark checkbounds(A, 1)
================ Benchmark Results ========================
   Average elapsed time: 3.73 ns
     95% CI for average: [3.65 ns, 3.81 ns]
   Minimum elapsed time: 7.13 ns
                GC time: 0.00%
       Memory allocated: 0 bytes
  Number of allocations: 0 allocations
      Number of samples: 3901
        R² of OLS model: 0.954
Time used for benchmark: 0.03s
            Precompiled: true
       Multiple samples: true
       Search performed: true

julia> @benchmark checkbounds(B, 1)
================ Benchmark Results ========================
   Average elapsed time: 3.87 ns
     95% CI for average: [3.79 ns, 3.94 ns]
   Minimum elapsed time: 6.27 ns
                GC time: 0.00%
       Memory allocated: 0 bytes
  Number of allocations: 0 allocations
      Number of samples: 4201
        R² of OLS model: 0.956
Time used for benchmark: 0.04s
            Precompiled: true
       Multiple samples: true
       Search performed: true

julia> @noinline f(A) = checkbounds(A, 1)
f (generic function with 1 method)

julia> @benchmark f(A)
================ Benchmark Results ========================
   Average elapsed time: 6.39 ns
     95% CI for average: [6.29 ns, 6.49 ns]
   Minimum elapsed time: 6.10 ns
                GC time: 0.00%
       Memory allocated: 0 bytes
  Number of allocations: 0 allocations
      Number of samples: 7701
        R² of OLS model: 0.952
Time used for benchmark: 0.09s
            Precompiled: true
       Multiple samples: true
       Search performed: true

julia> @benchmark f(B)
================ Benchmark Results ========================
   Average elapsed time: 6.44 ns
     95% CI for average: [6.33 ns, 6.55 ns]
   Minimum elapsed time: 6.67 ns
                GC time: 0.00%
       Memory allocated: 0 bytes
  Number of allocations: 0 allocations
      Number of samples: 6501
        R² of OLS model: 0.950
Time used for benchmark: 0.09s
            Precompiled: true
       Multiple samples: true
       Search performed: true

johnmyleswhite commented 9 years ago

Finally have some time to finish this package. I think this approach is reasonable and I'm willing to accept the cost of a function call to minimize weird effects elsewhere. Would be great to get this rebased whenever you have some time.

The one thing I'm worried about is that this seems to remove the possibility of benchmarking expressions that depend upon variables that have to be setup in the setup expression, because those variables are now globals being accessed by the inner function.

Or is that concern not relevant for a reason that's not obvious to me?

mbauman commented 9 years ago

Ah, no, you're right. I didn't think about the setup expr — I'm not sure how to best address that. In fact, I think benchmarks that require variables from a setup expr will now fail (since they aren't global). I'm not sure how to best deal with that.

One possible alternative would be to only permit benchmarking single functions with the simple @benchmark macro, and then evaluate all arguments to that function as "setup". This is similar to what @Yuyichao proposed in https://github.com/johnmyleswhite/Benchmarks.jl/issues/16#issuecomment-136337077. The constant binding/literal vs mutable binding heuristic I use here is a little subtle.

johnmyleswhite commented 9 years ago

I feel like getting this totally right is slightly above my current level of understanding of the compiler. I might try to start an e-mail thread with some core compiler folks to figure this out.

johnmyleswhite / Benchmarks.jl

Use a no-inline inner function #17