This hotfix fixes the performance of the VUV kernel:
When two-stage tuning was enabled, it was observed that the VUV kernel seemingly regressed (#1291)
It turns out this wasn't a regression per se, rather it revealed a long standing issue that the launch parameters of the VUV kernel weren't being set correctly outside of the tuning loop
Specifically, the Arg::coarse_color_wave parameter, which maps to TuneParam::aux wasn't being set in the apply function, it was only being set during the tuning process.
So while the kernel autotuned correctly, post tuning, it would not be using the desired parameters, leading to reduced performance.
I have also made some minor changes to the tuning:
Display a warning if the best second-stage tuning time regresses by more than 10% versus the first stage. In general we should expect the second stage to be faster than the first stage, since it will generally involve more iterations. The 10% margin is left due to latency bubbles, etc.
Even then, this warning can still be triggered if one doesn't lock clocks for small short running kernels.
Fix a minor bug when printing the aux parameter while tuning.
Visual review looks great, I'm always a fan of a simple fix. I've got a due-diligence build+run going now, once that's complete I'll give it the approval + merge.
This hotfix fixes the performance of the VUV kernel:
Arg::coarse_color_wave
parameter, which maps toTuneParam::aux
wasn't being set in theapply
function, it was only being set during the tuning process.I have also made some minor changes to the tuning:
aux
parameter while tuning.new
/delete
with astd::vector
Closes #1291