Restrict benchmarking to functions

mbauman commented 9 years ago

Consider the function's arguments to be setup expressions, which are executed in an outer closure. The resulting arguments are then passed through a function barrier to the actual benchmarking function, which ensures that Julia has inferred concrete types for each of them.

Note that this removes the (at)noinline'd inner function. This means that if the function being benchmarked is simple enough (or manually flagged) to inline, LLVM may eliminate the loop. This reintroduces #5.

mbauman commented 9 years ago

I'm not sure whether we should remove the @noinline'd inner function.

Pros to removing it:

Simpler
Behaves exactly as though you called the function yourself inside a loop
You can see how well LLVM can identify what's going on and if it's able to remove some work

Cons to removing it:

LLVM can optimize it too well and totally remove the entire benchmarking loop.
Repeated calls isn't a typical use-case for an inlined function. The kinds of optimizations LLVM will make here will be very different than the kinds of optimizations that would normally occur.

mbauman commented 9 years ago

This does solve @staticfloat's use case in #22:

julia> let
       x = 10
       @benchmark sin(x)
       end
================ Benchmark Results ========================
     Time per evaluation: 23.76 ns [23.54 ns, 23.98 ns]
Proportion of time in GC: 0.00% [0.00%, 0.00%]
        Memory allocated: 0.00 bytes
   Number of allocations: 0 allocations
       Number of samples: 9601
   Number of evaluations: 18827301
         R² of OLS model: 0.978
 Time spent benchmarking: 0.53 s

julia> @benchmark sin(x)
================ Benchmark Results ========================
     Time per evaluation: 23.85 ns [23.60 ns, 24.11 ns]
Proportion of time in GC: 0.00% [0.00%, 0.00%]
        Memory allocated: 0.00 bytes
   Number of allocations: 0 allocations
       Number of samples: 9601
   Number of evaluations: 18827301
         R² of OLS model: 0.971
 Time spent benchmarking: 0.53 s

staticfloat commented 9 years ago

I'm confused; shouldn't the second benchmark above fail? There should be no "x" in scope to pass in to sin().

mbauman commented 9 years ago

there was an x=10 in global scope, too, I just missed copying it in.

sglyon commented 9 years ago

Could we somehow allow users to opt in to the no-inline version (and suggest that they do if we suspect the whole loop gets optimized away), or would that muddy the clear semantic reasoning we are building here?

mbauman commented 9 years ago

~~I can go either way here~~ (actually, I think I just convinced myself), but we definitely should not do or even allow both.

If we use a noinline function, there is simply always a constant function call overhead. This is then comparable. E.g., if I see 3ns, I know there was one function call + minimal work (the function I was testing got inlined into the noinline test function). 6ns is two function calls and minimal work (it didn't inline!).

Without it, everything is one function-call-overhead faster. So you can still compare an inlined result with a function that didn't inline. Well, sort-of. Inlined functions could behave in absolutely any way. LLVM can do all sorts of wonky optimizations, ranging from full loop elision to hoisting of some work to doing nothing at all. And this is an interaction with the testing loop, not with the function call itself.

So let's reinstate the @noinline barrier. Sound good?

staticfloat commented 9 years ago

Yes, I agree. It's better to have reliable test machinery than fast test machinery.

On Thu, Oct 1, 2015, 12:20 PM Matt Bauman notifications@github.com wrote:

I can go either way here (actually, I think I just convinced myself), but we definitely should not do or even allow both.

If we use a noinline function, there is simply always a constant function call overhead. This is then comparable. E.g., if I see 3ns, I know there was one function call + minimal work (the function I was testing got inlined into the noinline test function). 6ns is two function calls and minimal work (it didn't inline!).

Without it, everything is one function-call-overhead faster. So you can still compare an inlined result with a function that didn't inline. Well, sort-of. Inlined functions could behave in absolutely any way. LLVM can do all sorts of wonky optimizations, ranging from full loop elision to hoisting of some work to doing nothing at all. And this is an interaction with the testing loop, not with the function call itself.

So let's reinstate the @noinline barrier. Sound good?

— Reply to this email directly or view it on GitHub https://github.com/johnmyleswhite/Benchmarks.jl/pull/25#issuecomment-144821927 .

sglyon commented 9 years ago

also agree

mbauman commented 9 years ago

Alright, updated to use inner functions. I've also used gensyms for the names just to be safe since I'm not 100% sure how hygiene is working with nested macros calling inline macros.

staticfloat commented 9 years ago

I tested this out, and it seems to work for my benchmarks. Thanks Matt!

johnmyleswhite commented 9 years ago

Ok, I'm pretty happy with this design. Rebase and merge?

mbauman commented 9 years ago

Done and done!

johnmyleswhite / Benchmarks.jl

Restrict benchmarking to functions #25