Closed mbauman closed 9 years ago
This is what I used when I realized this problem. (Incidentally it is also used to benchmark bounds check.....)
Basically the point is evaluate the arguments and only benchmarking the out-most function call, would this be too restrictive? (Also, with another wrapper it may not be necessary to evaluate the argument at macro expansion time.)
Another issue for this approach is that this will add additional overhead if the function you want to evaluate is not inlined by the compiler. IIRC, when I was playing with rand
, this overhead completely shadows everything else. A call site annotation of inlining would be nice....
Yes, I thought about doing that, too, but I wanted to retain the freedom of arbitrary expressions that the @benchmark
macro currently provides. My pull request over at #17 could easily be modified to only accept a single function call and evaluate all arguments within the "outer" function. Perhaps that could be provided by a different macro (@bench_function
or somesuch). Or JMW could decide that it's best to only allow benchmarking of functions and we could then change the API.
As far as the additional function call for the @noinline
'ed inner function, I actually like that it is always there. This allows for comparisons between functions that inline and those that don't (since there will be the cost of two function calls instead of just one).
After using it for a little while, I actually really like the non-constant-binding heuristic in #17. It almost always just does what I want. And it's really easy to change just by using a temporary variable or declaring something as const
.
Should be resolved now.
This has bitten me more than once, and it's really tough to figure out what's going on. This package is really awesome, but I still sometimes run into this snag. Instead of splicing the expression to benchmark directly into the benchmarking function, could we wrap it with a
@noinline
function first? I ran into this while testing out the new checkbounds API at https://github.com/JuliaLang/julia/pull/11895#issuecomment-130335729. You won't see these sorts of crazy effects with the stockcheckbounds
, but this is a really good demonstration:Whoa now, will the real result please stand up? As far as I can figure out, this is happening due to a crazy confluence of things:
checkbounds
should be inlinedesc(core)
ends up splicing this into@benchmarkable
:#673#out = (Main.checkbounds)(Main.A,1)
Main.A
isn't const, it has to look it up and resolve method dispatch every timecheckbounds
viajl_apply_generic
, it has to allocate a tuple for the vararg creating a GC frameSo… how do we resolve this? Putting expressions into a
@noinline
'ed function may help resolve #5 by hiding a bit more from LLVM's optimizations. Is it possible to determine non-constant bindings at macro eval time? It seems like that's what we'd need to mitigate the name resolution craziness:@noinline
'ed function, which wraps the expression@benchmarkable
function with a type assertion that was spliced in at parse-time, e.g.,A = Main.A::$(typeof(Main.A))
. This will allow dispatch to work normally, and will throw an error ifMain.A
ends up changing, which seems reasonable.