Overhead caused by widening

The following benchmark gives very unexpected result.

julia> using Benchmarks

julia> @benchmark rand()
================ Benchmark Results ========================
     Time per evaluation: 5.93 ns [5.86 ns, 6.01 ns]
Proportion of time in GC: 0.00% [0.00%, 0.00%]
        Memory allocated: 0.00 bytes
   Number of allocations: 0 allocations
       Number of samples: 11001
   Number of evaluations: 71490001
         R² of OLS model: 0.951
 Time spent benchmarking: 0.65 s

julia> @benchmark rand(Float32)
================ Benchmark Results ========================
     Time per evaluation: 51.57 ns [50.89 ns, 52.24 ns]
Proportion of time in GC: 0.18% [0.00%, 0.38%]
        Memory allocated: 16.00 bytes
   Number of allocations: 1 allocations
       Number of samples: 10601
   Number of evaluations: 48829501
         R² of OLS model: 0.952
 Time spent benchmarking: 2.91 s

The reason is that the type widenning cause the inner function for rand(Float32) to be specialized on Tuple{DataType} rather than Tuple{Type{Float32}} which introduce all kinds of overhead.

The simplest trick I can think of to avoid this issue is to use a staged inner function (which will turn off type widenning...). IIUC, the call site of the inner function should always have the concrete type inferred so this shouldn't introduce additional overhead.

johnmyleswhite / Benchmarks.jl

Overhead caused by widening #36