shader-slang / slang

Making it easier to work with shaders
MIT License
1.97k stars 167 forks source link

Wrong matrix operations in Metal? #4253

Closed na2axl closed 2 months ago

na2axl commented 3 months ago

Hey there,

I'm using Slang for a project, and I'm stuck with an issue about how Slang generate matrix operations on the Metal backend.

I think in Metal, matrix layouts are by default column major so AFAIK matrix-vector multiplications should look like M * v but slang generate them as v * M which produce complete wrong results.

Even chained operations like:

v' = M * v;
v'' = M' * v';

Generates as a completely reversed operations:

v'' = v * M * M';

I have to manually edit the generated shader to have correct results. I've also searched the documentation to know what I can do, and found I can change the default matrix layout when creating a slang session, but doing so doesn't change anything...

Am I missing something?

For info, I'm effectively passing column-major matrix data to shaders, and things display well when I transpose them, but transposing matrices is not what I want...

jkwak-work commented 3 months ago

Metal support is still work-in-progress. We are hoping to get to the matrix major-ness problem soon. But that is a known problem that we didn't iron out yet.

na2axl commented 3 months ago

Oh I see, thanks for the quick answer :+1:

csyonghe commented 3 months ago

The way we map matrix operations to metal is exactly the same as how we map to GLSL.

If you have mul(M, v) in HLSL, we currently will generate v*M in GLSL. If you want to produce (M*v), you will need to write mul(v, M). Note that this behavior is not related to matrix layout.

This is based on my understanding of the metal spec and how they relate to HLSL spec. I can be wrong and we still need to wait until we can run these shaders to verify its correctness.

csyonghe commented 3 months ago

Both a Metal "column" and a GLSL "column" is what HLSL would call a "row". This is because in HLSL: you write matrix[rowIndex][colIndex], and in GLSL/Metal you write matrix[colIndex][rowIndex].

In Slang, we always refer to things indexed in the first bracket as a "row", regardless how a target language is referring to it. So if my understanding is correct, m*v in metal should produce the same result as mul(v,m) in HLSL, if m is laid out in memory exactly the same on both HLSL and Metal.

na2axl commented 3 months ago

The way we map matrix operations to metal is exactly the same as how we map to GLSL.

If you have mul(M, v) in HLSL, we currently will generate v*M in GLSL. If you want to produce (M*v), you will need to write mul(v, M). Note that this behavior is not related to matrix layout.

This is based on my understanding of the metal spec and how they relate to HLSL spec. I can be wrong and we still need to wait until we can run these shaders to verify its correctness.

According to the documentation here, when doing mul(v, M), HLSL interpret v as a row vector and when doing mul(M, v), v is interpreted as a column vector.

I think that's where the issue start because as a developer I know how to layout my matrix (column vs row) to have the correct result when I choose one these overloads of mul. I think Metal is doing the same when choosing between M * v for column vector multiplication and v * M for row vector multiplication, which seems to be the opposite for OpenGL.

csyonghe commented 3 months ago

If metal defines m*v to have the opposite meaning as glsl, then we should reverse them here too.

csyonghe commented 2 months ago

This issue is being fixed here: https://github.com/shader-slang/slang/pull/4374

The way we are doing the translations, is to translate float3x2 to mat<float,3,2> in metal. Note that metal defines matrix to be mat<T, num_cols, num_rows> and HLSL defines it to be matrix<T, num_rows, num_cols>. However, if you just do string substitution of "row" to "col" and "col" to "row" in the metal spec, the result will be consistent with HLSL. For example, in HLSL, you can construct matrix<float, 3, 2> with two float3 "row" vectors. In metal, you construct mat<float, 3, 2> with 2 float3 "column" vectors. Basically, we are mapping semantically contiguous dimension as "row". There isn't an alternative translation where we can keep the matrix constructing code as is.

Consider this example in HLSL:

float3x2 mat(float3(1,2,3), float3(4,5,6));

We want to translate it to:

float3x2 mat(float3(1,2,3), float3(4,5,6));

In metal. If we translate float3x2 to float2x3 in metal (so that an HLSL "row" maps to a metal "row"), we will need to rewrite this constructor to something like:

float2x3 mat(float2(1, 4), float2(2, 5), float2(3, 6));

Which is apparently undesirable. Mapping HLSL float3x2 to Metal float3x2 means that a mul(v, M) needs to be translated to M*v and mul(M, v) needs to be translated to v*M in order to produce the same result from the HLSL code:

float3x2 mat(float3(1,2,3), float3(4,5,6));
float3 v = mul(float3(1,2,3), mat);

Otherwise, the result output won't compile by the metal compiler, because there is no overload for operator*(float3, float3x2). Metal only defines operator*(float3x2, float3).

With the fix in the PR, all of our tests are now passing running on Metal, producing consistent results to D3D and Vulkan under this definition.

na2axl commented 2 months ago

Thanks @csyonghe for the detailed explanation. I checked the PR and seen you have also fixed many other issues I've faced. I know there is more to do on the Metal backend but I think it worth to notify you there is also an issue with the translation of SV_Targetn semantics, which are translated into [user(SV_Targetn)] attributes instead to the Metal counterpart [color(n)]. If you don't plan to fix it in your current PR I can create another issue for you to track it.

csyonghe commented 2 months ago

I wasn't aware of this one. Can you create an issue for it so we can track? Thank you!