Closed BeastLe9enD closed 4 weeks ago
What you are seeing here is that the Slang generated code is constructing the matrix unambiguously from the uniform buffer first. To ensure consistent data layout on all targets we support, Slang will always lower matrices to arrays of vectors in constant buffer declarations.
For runtime performance, since the driver will scalarize anything, the final code that hits the GPU should not have any of these steps and will perform just as fast. On most GPU architectures, there is no such concept as a matrix and they are lowered to scalar registers anyways.
Let us know if you find this to cause any actual performance problems on any GPU architecture, and we can generate code differently if that is case.
Closing since this is intended compiler behavior and we don't think it leads to unoptimized code or degraded performance. Feel free to reopen if this is not the case.
Lets say I have the following shader: (I'm targeting spirv with -emit-spirv-directly and my version is v2024.1.17):
I compiled it with
./slangc -emit-spirv-directly example.slang -o example.vert.spv | spirv-cross example.vert.spv
and the resulting glsl code looks the following (Im using spirv-cross to translate the resulting spirv to glsl):lets say I try to compile the equivalent file with dxc:
dxc -T vs_6_7 -E main -spirv -Fo example_dxc.vert.spv example.slang | spirv-cross example_dxc.vert.spv
, the resulting code looks much simpler:what is the reason for this? why is the mat4 constructed from 4 vec4, isn't this code unoptimized?