shader-slang / slang

Making it easier to work with shaders
MIT License
1.79k stars 160 forks source link

[SPIRV] Resulting matrix vector product code looks unoptimized #4316

Closed BeastLe9enD closed 4 weeks ago

BeastLe9enD commented 4 weeks ago

Lets say I have the following shader: (I'm targeting spirv with -emit-spirv-directly and my version is v2024.1.17):

struct Constants {
    float4x4 view_projection_matrix;
};

[[vk::push_constant]] Constants constants;

[shader("vertex")]
float4 main(): SV_Position {
    return mul(constants.view_projection_matrix, float4(1.0, 0.0, 1.0, 1.0));
}

I compiled it with ./slangc -emit-spirv-directly example.slang -o example.vert.spv | spirv-cross example.vert.spv and the resulting glsl code looks the following (Im using spirv-cross to translate the resulting spirv to glsl):

#version 450

struct _MatrixStorage_float4x4_ColMajorstd140
{
    vec4 data[4];
};

struct Constants_std140
{
    _MatrixStorage_float4x4_ColMajorstd140 view_projection_matrix;
};

uniform Constants_std140 constants;

void main()
{
    gl_Position = vec4(1.0, 0.0, 1.0, 1.0) * mat4(vec4(constants.view_projection_matrix.data[0].x, constants.view_projection_matrix.data[1].x, constants.view_projection_matrix.data[2].x, constants.view_projection_matrix.data[3].x), vec4(constants.view_projection_matrix.data[0].y, constants.view_projection_matrix.data[1].y, constants.view_projection_matrix.data[2].y, constants.view_projection_matrix.data[3].y), vec4(constants.view_projection_matrix.data[0].z, constants.view_projection_matrix.data[1].z, constants.view_projection_matrix.data[2].z, constants.view_projection_matrix.data[3].z), vec4(constants.view_projection_matrix.data[0].w, constants.view_projection_matrix.data[1].w, constants.view_projection_matrix.data[2].w, constants.view_projection_matrix.data[3].w));
}

lets say I try to compile the equivalent file with dxc: dxc -T vs_6_7 -E main -spirv -Fo example_dxc.vert.spv example.slang | spirv-cross example_dxc.vert.spv, the resulting code looks much simpler:

#version 450

struct type_PushConstant_Constants
{
    mat4 view_projection_matrix;
};

uniform type_PushConstant_Constants constants;

void main()
{
    gl_Position = vec4(1.0, 0.0, 1.0, 1.0) * constants.view_projection_matrix;
}

what is the reason for this? why is the mat4 constructed from 4 vec4, isn't this code unoptimized?

csyonghe commented 4 weeks ago

What you are seeing here is that the Slang generated code is constructing the matrix unambiguously from the uniform buffer first. To ensure consistent data layout on all targets we support, Slang will always lower matrices to arrays of vectors in constant buffer declarations.

For runtime performance, since the driver will scalarize anything, the final code that hits the GPU should not have any of these steps and will perform just as fast. On most GPU architectures, there is no such concept as a matrix and they are lowered to scalar registers anyways.

Let us know if you find this to cause any actual performance problems on any GPU architecture, and we can generate code differently if that is case.

csyonghe commented 4 weeks ago

Closing since this is intended compiler behavior and we don't think it leads to unoptimized code or degraded performance. Feel free to reopen if this is not the case.