Closed mbauman closed 9 years ago
I'm not sure whether we should remove the @noinline
'd inner function.
Pros to removing it:
Cons to removing it:
This does solve @staticfloat's use case in #22:
julia> let
x = 10
@benchmark sin(x)
end
================ Benchmark Results ========================
Time per evaluation: 23.76 ns [23.54 ns, 23.98 ns]
Proportion of time in GC: 0.00% [0.00%, 0.00%]
Memory allocated: 0.00 bytes
Number of allocations: 0 allocations
Number of samples: 9601
Number of evaluations: 18827301
R² of OLS model: 0.978
Time spent benchmarking: 0.53 s
julia> @benchmark sin(x)
================ Benchmark Results ========================
Time per evaluation: 23.85 ns [23.60 ns, 24.11 ns]
Proportion of time in GC: 0.00% [0.00%, 0.00%]
Memory allocated: 0.00 bytes
Number of allocations: 0 allocations
Number of samples: 9601
Number of evaluations: 18827301
R² of OLS model: 0.971
Time spent benchmarking: 0.53 s
I'm confused; shouldn't the second benchmark above fail? There should be no "x" in scope to pass in to sin().
there was an x=10
in global scope, too, I just missed copying it in.
Could we somehow allow users to opt in to the no-inline version (and suggest that they do if we suspect the whole loop gets optimized away), or would that muddy the clear semantic reasoning we are building here?
I can go either way here (actually, I think I just convinced myself), but we definitely should not do or even allow both.
If we use a noinline function, there is simply always a constant function call overhead. This is then comparable. E.g., if I see 3ns, I know there was one function call + minimal work (the function I was testing got inlined into the noinline test function). 6ns is two function calls and minimal work (it didn't inline!).
Without it, everything is one function-call-overhead faster. So you can still compare an inlined result with a function that didn't inline. Well, sort-of. Inlined functions could behave in absolutely any way. LLVM can do all sorts of wonky optimizations, ranging from full loop elision to hoisting of some work to doing nothing at all. And this is an interaction with the testing loop, not with the function call itself.
So let's reinstate the @noinline
barrier. Sound good?
Yes, I agree. It's better to have reliable test machinery than fast test machinery.
On Thu, Oct 1, 2015, 12:20 PM Matt Bauman notifications@github.com wrote:
I can go either way here (actually, I think I just convinced myself), but we definitely should not do or even allow both.
If we use a noinline function, there is simply always a constant function call overhead. This is then comparable. E.g., if I see 3ns, I know there was one function call + minimal work (the function I was testing got inlined into the noinline test function). 6ns is two function calls and minimal work (it didn't inline!).
Without it, everything is one function-call-overhead faster. So you can still compare an inlined result with a function that didn't inline. Well, sort-of. Inlined functions could behave in absolutely any way. LLVM can do all sorts of wonky optimizations, ranging from full loop elision to hoisting of some work to doing nothing at all. And this is an interaction with the testing loop, not with the function call itself.
So let's reinstate the @noinline barrier. Sound good?
— Reply to this email directly or view it on GitHub https://github.com/johnmyleswhite/Benchmarks.jl/pull/25#issuecomment-144821927 .
also agree
Alright, updated to use inner functions. I've also used gensyms for the names just to be safe since I'm not 100% sure how hygiene is working with nested macros calling inline macros.
I tested this out, and it seems to work for my benchmarks. Thanks Matt!
Ok, I'm pretty happy with this design. Rebase and merge?
Done and done!
Consider the function's arguments to be setup expressions, which are executed in an outer closure. The resulting arguments are then passed through a function barrier to the actual benchmarking function, which ensures that Julia has inferred concrete types for each of them.
Note that this removes the (at)noinline'd inner function. This means that if the function being benchmarked is simple enough (or manually flagged) to inline, LLVM may eliminate the loop. This reintroduces #5.