Concisely describe the proposed feature
Low level optimization to utilize hardware's instructions like mad, rcp, log2, exp2 which are only one-cycle, as well as use scalar instead of vector etc.
for example write (x + 3.0f) * 1.5f gives two instruction (add and mul) but x * 1.5f + 4.5f give one instruction (mad)
Although it highly depends on hardware, mostly it's beneficial to use mad style instruction on GPU.
Doing so, according to report[1], could have ~7% performance improvement on shaders.
However, traditionally it was graphic programmers do the job manually, as report[1] claims "Compiler can't change operation semantics so it cant optimize for you". If compiler can do such kind of optimization, it will also save programmer's time.
Describe the solution you'd like (if any)
I see two ways,
a. Add those operators into IR, and use some algorithm optimize semantic
b. Generate optimized code through backend codegen.
For a, I am not a little bit familiar with compiler so no suggestion I can give. For b, I doubt the viability as it's build from IR.
Concisely describe the proposed feature Low level optimization to utilize hardware's instructions like mad, rcp, log2, exp2 which are only one-cycle, as well as use scalar instead of vector etc.
for example write
(x + 3.0f) * 1.5f
gives two instruction (add and mul) butx * 1.5f + 4.5f
give one instruction (mad)Although it highly depends on hardware, mostly it's beneficial to use mad style instruction on GPU.
Doing so, according to report[1], could have ~7% performance improvement on shaders.
However, traditionally it was graphic programmers do the job manually, as report[1] claims "Compiler can't change operation semantics so it cant optimize for you". If compiler can do such kind of optimization, it will also save programmer's time.
Describe the solution you'd like (if any) I see two ways,
a. Add those operators into IR, and use some algorithm optimize semantic
b. Generate optimized code through backend codegen.
For a, I am not a little bit familiar with compiler so no suggestion I can give. For b, I doubt the viability as it's build from IR.
Additional comments References:
[1] Low-Level Shader Optimization for Next-Gen and DX11, GDC2014, slides [2] Low-Level GLSL Optimisation for PowerVR