The OEC compiled baseline for many kernels performs more instructions than the dawn reference experiments, even if both variants execute the kernels in sequence without any inlining or fusion.
I close this since I think parts of this is due to the not optimal nvptx support (fma for integer etc)... I think this is a minor issue at the moment and we should focus on features and correctness
The OEC compiled baseline for many kernels performs more instructions than the dawn reference experiments, even if both variants execute the kernels in sequence without any inlining or fusion.