xamarin / xamarin-macios

.NET for iOS, Mac Catalyst, macOS, and tvOS provide open-source bindings of the Apple SDKs for use with .NET managed languages such as C#
Other
2.44k stars 510 forks source link

SCNMatrix4 Breaking Change Breaks SCNNode.Transform #15094

Closed praeclarum closed 2 years ago

praeclarum commented 2 years ago

Short Version

TLDR; The recent change (https://github.com/xamarin/xamarin-macios/pull/13695) in .NET 6 to make SCNMatrix4 column-major breaks the Transform property of SCNNode. The change correctly identified that SCNMatrix4 was bound incorrectly, but the solution to the problem is currently incomplete and makes the class behave in inconsistent ways.

Long Version

Here is a video of me walking through discovering and explaining the problem: https://www.twitch.tv/videos/1491014116?t=00h15m47s

Linear Algebra Primer

In linear algebra, the order or multiplication is important.

A*B != B*A

If one wishes to use a matrix for transforming points, one must choose if points (vectors) will be multiplied on the left hand side of the matrix or on the right. If the vector is on the left, it must, by dimensional analysis, be a row-vector (shaped 1x4). If it is on the right, it must be a column vector (4x1).

M*vc ~ [4x4]*[4x1] (for col vector)
vr*M ~ [1x4]*[4x4] (for row vector)

This has repercussions in how the matrices are composed. If vectors are multiplied on the left (row vectors), then matrices should be composed so the left most operation happens first. If however, vectors are multiplied on the right (column vectors), then matrices should be composed so the first operation is on the right and the last is on the left.

The Column-major vs Row-major Debate

Regardless of what's in a matrix and how it was composed and intended to be used, it can be stored in either row or column major format. This decision is orthogonal to whether left or right vector multiplication will be used for actual transforms.

A History Lesson

There is a strange history in .NET to favor row-vectors over column vectors. I say strange because all of science, engineering, and pretty much every 3D engine in the universe uses column vectors.

SceneKit also uses column vectors. Please note that I am saying column vectors and not column-major element ordering. It does also use column-major ordering, but let's table that for the moment.

When SceneKit was bound 8 years ago, SCNMatrix4 should have been implemented as a Column-vector Column-major matrix. Sadly, it was implemented as a Row-vector Row-major matrix.

A Modern Tragedy

This mistake was caught early on, but it wasn't understood that the mistake was two-fold.

It was noticed that if you add a Transpose to the matrix before it is passed to SceneKit (SCNNode.Transform for example) then the matrix acted the same way as Apple's matrices (that are column-vector, column-major).

Sadly, the conclusion was that the mistake was purely due to the column-major, row-major mixup.

It wasn't recognized that the matrices were actually constructed incorrectly. Because the matrices have a Row-vector interface for construction (for example, SCNMatrix4.CreateTranslation, etc.) the order in which they were composed was backwards (compared to the Apple examples).

For .NET 6, the matrices were correctly switched to be Column-major element ordering. https://github.com/xamarin/xamarin-macios/pull/13695

However the APIs were not fixed to make the matrices Column-vector transformers (they remain row-vector transformers). This puts them in a terribly awkward state where they act like row-vector transformers when used by themselves and with vectors. But when used with SceneKit that is expecting Column-vector, Column-major element ordering, they don't work at all.

To demonstrate this inconsistency I have written some unit tests. Allow me to explain them next.

Unit Tests to Demonstrate Inconsistencies

Demonstrate that SCNMatrix4 is Row-vector, Column-major

I wish to first demonstrate that the matrix class itself isn't broken, but is indeed constructed in a Row-vector manner. Let's first demonstrate that simple translation works fine:

void TranslatePoint ()
{
    // Create test point
    var point = new SCNVector3 (1, 2, 3);
    // Create translation
    var matrix = SCNMatrix4.CreateTranslation (10, 0, 0);
    // Transform the point
    var newPoint = SCNVector3.TransformPosition (point, matrix);
    AssertEqual (new SCNVector3 (11, 2, 3), newPoint);
}

That works. To demonstrate that the matrix assumes left-hand vector multiplication (which means the vectors have to be row vectors), let's compose two transformations, one after the other. We will do a translation followed by a scaling operation. This is a a good test because the order of these operations matters a lot and is very evident.

void TranslateThenScalePoint ()
{
    var point = new SCNVector3 (1, 2, 3);
    var matrix =
        SCNMatrix4.CreateTranslation (-1, 0, 0) *
        SCNMatrix4.Scale (10, 1, 1);
    var newPoint = SCNVector3.TransformPosition (point, matrix);
    AssertEqual (new SCNVector3 (0, 2, 3), newPoint);
}

Note that the translation operation had to come before (on the left) the scale operation. This means it runs first, then the scale. This is proven out in the test because the X coordinate of the point is first translated then scaled:

  1. X = (1 + T)*S
  2. X = (1 + (-1))*10
  3. X = 0*10
  4. X = 0

If you instead put the scale on the left of the translation, then you note that the result is quite different.

  1. X = 1*S + T
  2. X = 1*10 + (-1)
  3. X = 10 - 1
  4. X = 9

We can see from this code and its results that the matrices are indeed constructed assuming row vectors. This is independent of the memory order of elements.

The memory order of the matrices doesn't really make a difference here because the algorithms correctly address the fields based on whether they're rows or columns.

The problem is: this hack doesn't work with SCNNode, because it's incomplete.

Demonstrate that the matrix does not work with SCNNode

The .NET 6 matrix hack does not fix how matrices are used with SCNNodes. This is because, although effort was put into them being Column-major, no effort was put into making them Column-vector constructed. This means the matrices are still the transpose of what they should be. One problem was fixed, while the other, much bigger problem, was ignored.

To demonstrate this confusing state of affairs. Let us now attempt to use one of the .NET 6 matrices with an SCNNode. We will do the simplest thing possible, translate the node using a translation matrix.

void TranslateNode ()
{
    // Create a test node (it defaults to position 0,0,0)
    var node = SCNNode.Create ();
    // Create a translation matrix
    // (we know from before this will be built as a row-vector transformer)
    var matrix = SCNMatrix4.CreateTranslation (1, 2, 3);
    // Use that matrix to transform the node
    node.Transform = matrix;
    // Ask the node to extract just the translation part of the matrix
    var newPoint = node.Position;
    // Verify that it is now positioned at (1,2,3)
    AssertEqual (new SCNVector3 (1, 2, 3), newPoint);
}

This code fails even though we just showed that translation matrices work.

The problem is this: the matrix is constructed as a row-vector transformer. It is encoded column-major. Because it's column-major the Microsoft binding passes it directly to SceneKit. BUT SceneKit wants a column-vector transformer. The result is, the transform is not applied correctly (the translation is actually stored in the homogenous skew coordinates, not a good place).

This means that you still have to do a Transpose operation when using these matrices on SCNNodes BUT you must use the normal (not transposed version) when transforming points.

This is terribly inconsistent and makes writing 3D math impossible.

Proposed Solutions

The Easy Solution

Just go back to row-vector, row-major matrices. I know they're wrong. But they're wrong in an internally consistent way. They work the same way with point transforms as they do with SCNNodes and friends.

Then there would be no broken breaking change and we can all get on with our lives.

The Correct Solution

SCNMatrix4 should be designed and implemented as a Column-vector, Column-major matrix (or a new struct should be introduced). It would match perfectly with SceneKit, OpenGL, mathematics, engineering, and basically everything else in the universe. It would be both internally consistent and work with SceneKit without any transpositions.

I wish this was the breaking change introduced in .NET 6. Instead, the change we got is only a partial solution.

Workarounds

If you are coming into this bug, completely confused by .NET 6's matrices, let me give you this advice about how to work around its new design:

  1. Compose all transformations assuming row vectors (left-most operations happen before right-most).
  2. If you wish to transform points, use those matrices directly.
  3. If you wish to use those transforms with SceneKit, then you have to transpose them before setting them as properties.
  4. If you wish to read transforms from SceneKit, you will have to transpose them before combining them with other matrices or reading data from them.

Conclusion

Thank you for coming to my TED talk. I know that everything I wrote can sound pedantic, but I hope I demonstrated through the tests and examples that the new (.NET 6) SCNMatrix4 operates in a very inconsistent manner. This is due to not fully understanding why the original binding in SceneKit was incorrect and only doing a partial fix (fixed the element ordering but neglected to fix the construction and transformation methods).

Attached Tests and Repro

SCNMatricNET6Boug.zip

rolfbjarne commented 2 years ago

This is rather unfortunate, and I'm sorry we got this wrong.

Just to make sure I understand correctly, a potential fix would be to change this:

https://github.com/xamarin/xamarin-macios/blob/d2e0a364d68adda966140ffb435195c5fa6657e2/src/SceneKit/SCNMatrix4_dotnet.cs#L519

to:

result.Column3 = new SCNVector4 (x, y, z, 1);

(and the equivalent change in the result of the SCNMatrix4 class).

Great bug report btw!

praeclarum commented 2 years ago

Yes, exactly. Such changes to the construction methods would make them column-vector transformers. (Translation in the last column is how I always remember column vectors).

Unfortunately, there are a lot of those construction methods. The Transform methods also have to be changed (they’re doing a left multiply right now).

Matrix multiply should be fine because it’s universal. But I have seen some matrix multiplies that ignore the skew elements of the matrix and that would make it different (because it’s not doing the full 4x4 multiply).

So happy my bug report made sense to you. Sorry it was so late, I didn’t know SceneKit was being modified.

rolfbjarne commented 2 years ago

I've looked a bit more into this, and oh dear Apple, they haven't made things easy.

Step 1 is to try to answer the question: "does SCNMatrix4 really use a column-major layout?"

  1. There's nowhere in the headers that say whether SCNMatrix4 is row-major or column-major. There's no API that returns/works with rows or columns, they all act on the entire matrix. The only hint I could find was the layout of the definition:

    typedef struct SCNMatrix4 {
        float m11, m12, m13, m14;
        float m21, m22, m23, m24;
        float m31, m32, m33, m34;
        float m41, m42, m43, m44;
    } SCNMatrix4;

    This kind of implies row-major (it "looks that way"), but it's not explicit stated.

  2. The simd_float4x4 struct is column major, according to how it's defined:

    typedef struct { simd_float4 columns[4]; } simd_float4x4;

    This looks pretty obvious.

  3. The conversion from simd_float4x4 to SCNMatrix4 clearly says that they're identical (either both row-major or column-major), because it's just blitting memory:

    NS_INLINE SCNMatrix4 SCNMatrix4FromMat4(simd_float4x4 m) {
        return *(SCNMatrix4 *)&m;
    }
  4. The opposite conversion (from SCNMatrix4 to simd_float4x4) makes it clear they're both column-major:

    NS_INLINE simd_float4x4 SCNMatrix4ToMat4(SCNMatrix4 m) {
        return (simd_float4x4){
            .columns[0] = simd_make_float4(m.m11, m.m12, m.m13, m.m14),
            .columns[1] = simd_make_float4(m.m21, m.m22, m.m23, m.m24),
            .columns[2] = simd_make_float4(m.m31, m.m32, m.m33, m.m34),
            .columns[3] = simd_make_float4(m.m41, m.m42, m.m43, m.m44)
        };
    }
  5. While there's no obvious SCNMatrix4 API telling us whether SCNMatrix4 is row-major or column-major, it has an effect on matrix multiplication.

    Say we have two matrices:

    $$A=\begin{bmatrix}1 & 0 & 0 & 2\\0 & 1 & 0 & 0\\0 & 0 & 1 & 0\\0 & 0 & 0 & 1\end{bmatrix}$$

    and

    $$B=\begin{bmatrix}3 & 0 & 0 & 0\\0 & 1 & 0 & 0\\0 & 0 & 1 & 0\\0 & 0 & 0 & 1\end{bmatrix}$$

    multiplying these two gives us:

    $$AB=\begin{bmatrix}3 & 0 & 0 & 2\\0 & 1 & 0 & 0\\0 & 0 & 1 & 0\\0 & 0 & 0 & 1\end{bmatrix}$$

    now if the input matrices was transposed (i.e. switched between row-major and column-major), then we'd get this result:

    $$X=\begin{bmatrix}1 & 0 & 0 & 0\\0 & 1 & 0 & 0\\0 & 0 & 1 & 0\\2 & 0 & 0 & 1\end{bmatrix}$$

    $$Y=\begin{bmatrix}3 & 0 & 0 & 0\\0 & 1 & 0 & 0\\0 & 0 & 1 & 0\\0 & 0 & 0 & 1\end{bmatrix}$$

    $$XY=\begin{bmatrix}3 & 0 & 0 & 0\\0 & 1 & 0 & 0\\0 & 0 & 1 & 0\\6 & 0 & 0 & 1\end{bmatrix}$$

    Now let's create a translation SCNMatrix4:

    SCNMatrix translate = SCNMatrix4MakeTranslation (2, 0, 0);

    and we get back the identity matrix, except that m41=2:

    SCNMatrix4(m11: 1.0, m12: 0.0, m13: 0.0, m14: 0.0, m21: 0.0, m22: 1.0, m23: 0.0, m24: 0.0, m31: 0.0, m32: 0.0, m33: 1.0, m34: 0.0, m41: 2.0, m42: 0.0, m43: 0.0, m44: 1.0)

    and let's create a scale SCNMatrix4:

    SCNMatrix scale = SCNMatrix4MakeScale (3, 1, 1);

    and this is the identity matrix, except that m11=3:

    SCNMatrix4(m11: 3.0, m12: 0.0, m13: 0.0, m14: 0.0, m21: 0.0, m22: 1.0, m23: 0.0, m24: 0.0, m31: 0.0, m32: 0.0, m33: 1.0, m34: 0.0, m41: 0.0, m42: 0.0, m43: 0.0, m44: 1.0)

    Note that the scale matrix is the same for both row-major and column-major layouts.

    Finally let's multiply these two:

    SCNMatrix transform = SCNMatrix4Mult (translate, scale);

    and we get back:

    SCNMatrix4(m11: 3.0, m12: 0.0, m13: 0.0, m14: 0.0, m21: 0.0, m22: 1.0, m23: 0.0, m24: 0.0, m31: 0.0, m32: 0.0, m33: 1.0, m34: 0.0, m41: 6.0, m42: 0.0, m43: 0.0, m44: 1.0)

    If SCNMatrix4 uses row-major layout, that would be:

    $$rowTransform=\begin{bmatrix}3 & 0 & 0 & 0\\0 & 1 & 0 & 0\\0 & 0 & 1 & 0\\6 & 0 & 0 & 1\end{bmatrix}$$

    And if SCNMatrix4 uses column-major layout, that would be:

    $$colTransform=\begin{bmatrix}3 & 0 & 0 & 6\\0 & 1 & 0 & 0\\0 & 0 & 1 & 0\\0 & 0 & 0 & 1\end{bmatrix}$$

    Now let's say SCNMatrix4 uses column-major layout, it would mean the translate matrix would be:

    $$colTranslate=\begin{bmatrix}1 & 0 & 0 & 2\\0 & 1 & 0 & 0\\0 & 0 & 1 & 0\\0 & 0 & 0 & 1\end{bmatrix}$$

    That's the A matrix from above, which multiplied with the scale matrix gives the AB matrix from above, which is neither the row-major (rowTransform) nor the column-major (colTransform) interpretation of the multiplication results.

    So let's try the other way, assume SCNMatrix4 uses row-major layout, it would mean the translate matrix would be:

    $$rowTranslate=\begin{bmatrix}1 & 0 & 0 & 0\\0 & 1 & 0 & 0\\0 & 0 & 1 & 0\\2 & 0 & 0 & 1\end{bmatrix}$$

    That's the X matrix from above, which multiplied with the scale matrix gives the XY matrix from above, which is exactly the transform matrix when we assume SCNMatrix4 is using row-major layout (rowTransform).

    In the end I guess I've just proved that:

    $$(A^t)(B^t) ≠ (AB)^t$$

  6. So there seems to be a mismatch between what SCNMatrix4 and simd_float4x4 do.

    Let's try converting the SCNMatrix4 matrices we have to simd_float4x4 and multiply those types instead.

    let translate = SCNMatrix4MakeTranslation (2, 0, 0)
    let scale = SCNMatrix4MakeScale (3, 1, 1)
    let transform = SCNMatrix4Mult (translate, scale)
    
    let translateSimd = float4x4.init (translate)
    let scaleSimd = float4x4.init (scale)
    let transformSimd = translateSimd * scaleSimd
    
    print (transform)
    print (transformSimd)

    that results in:

    SCNMatrix4(m11: 3.0, m12: 0.0, m13: 0.0, m14: 0.0, m21: 0.0, m22: 1.0, m23: 0.0, m24: 0.0, m31: 0.0, m32: 0.0, m33: 1.0, m34: 0.0, m41: 6.0, m42: 0.0, m43: 0.0, m44: 1.0)

    and

    simd_float4x4([[3.0, 0.0, 0.0, 0.0], [0.0, 1.0, 0.0, 0.0], [0.0, 0.0, 1.0, 0.0], [2.0, 0.0, 0.0, 1.0]])

    There's no need to study these results much, in the SCNMatrix4 you can see that m41=6.0, but the value 6.0 never shows up in the simd_float4x4 results.

  7. Maybe there's something weird about how SCNMatrix4Mult works

    Maybe the SCNMatrix4Mult parameters could be reversed somehow? Reading the documentation, it clearly states that the first argument is the left operand, and the second argument is the right operand, so no luck there.

    It also says: "As a transformation, the result of multiplying a matrix A by a matrix B is the transformation represented by B followed by the transformation represented by A."

    Which is how composing matrices work with column vectors:

    $$B(Ax) = (BA)x$$

    Let's test this with SCNNode:

    let test = SCNNode()
    let translate = SCNMatrix4MakeTranslation (2, 0, 0)
    let scale = SCNMatrix4MakeScale (3, 1, 1)
    test.transform = SCNMatrix4Mult (translate, scale);
    print(test.position)

    SCNVector3(x: 6.0, y: 0.0, z: 0.0)

    OK, this means the translation is applied first, then scaled (x starts off at 0, we add 2, and then multiply by 3). The opposite of what the documentation said...

    Let's try reverting the arguments to SCNMatrix4:

    let test = SCNNode()
    let translate = SCNMatrix4MakeTranslation (2, 0, 0)
    let scale = SCNMatrix4MakeScale (3, 1, 1)
    test.transform = SCNMatrix4Mult (scale, translate);
    print(test.position)

    SCNVector3(x: 2.0, y: 0.0, z: 0.0)

    This is the reverse: scaled first, then translated - once again the opposite of what the documentation for SCNMatrix4Mult says.

    For the first case (first translate, then scale), this is the resulting SCNMatrix4:

    SCNMatrix4(m11: 3.0, m12: 0.0, m13: 0.0, m14: 0.0, m21: 0.0, m22: 1.0, m23: 0.0, m24: 0.0, m31: 0.0, m32: 0.0, m33: 1.0, m34: 0.0, m41: 6.0, m42: 0.0, m43: 0.0, m44: 1.0)

    Assuming column vectors, the only way for this matrix to transform the $\begin{bmatrix}0 & 0 & 0\end{bmatrix}$ position to $\begin{bmatrix}6 & 0 & 0\end{bmatrix}$ would be if the matrix was using column-major layout:

    $$colTransform=\begin{bmatrix}1 & 0 & 0 & 6\\0 & 1 & 0 & 0\\0 & 0 & 1 & 0\\0 & 0 & 0 & 1\end{bmatrix}$$

    So this is another point towards SCNMatrix4 being column-major.

    Maybe the documentation for SCNMatrix4Mult utterly wrong?

    Let's try comparing multiplication again, but reversing the arguments to SCNMatrix4Mult:

    let translate = SCNMatrix4MakeTranslation(2, 0, 0)
    let scale = SCNMatrix4MakeScale (3, 1, 1)
    let translateSimd = float4x4.init (translate)
    let scaleSimd = float4x4.init (scale)
    let transform = SCNMatrix4Mult (translate, scale)
    let transformSimd = scaleSimd * translateSimd
    print (transform)
    print (transformSimd)

    SCNMatrix4(m11: 3.0, m12: 0.0, m13: 0.0, m14: 0.0, m21: 0.0, m22: 1.0, m23: 0.0, m24: 0.0, m31: 0.0, m32: 0.0, m33: 1.0, m34: 0.0, m41: 6.0, m42: 0.0, m43: 0.0, m44: 1.0)

    and

    simd_float4x4([[3.0, 0.0, 0.0, 0.0], [0.0, 1.0, 0.0, 0.0], [0.0, 0.0, 1.0, 0.0], [6.0, 0.0, 0.0, 1.0]])

    Bingo! These matrices are identical (if they're both column-major)!

Conclusion

a) SCNMatrix4 is column-major. b) SCNMatrix4Mult reverses the argument, and does a $BA$ multiplication instead of $AB$. c) The documentation for SCNMatrix4Mult is very wrong.

rolfbjarne commented 2 years ago

Here's another argument for SCNMatrix4 being column-major:

  1. GLKMatrix4 can easily be determined to be column-major by creating a new GLKMatrix4 with the GLKMatrix4MakeWithColumns method and print the first four numbers in the struct:

    GLKMatrix4 glkm = GLKMatrix4MakeWithColumns (
        GLKVector4Make(1.0, 1.0, 1.0, 1.0),
        GLKVector4Make (2.0, 2.0, 2.0, 2.0),
        GLKVector4Make (3.0, 3.0, 3.0, 3.0),
        GLKVector4Make (4.0, 4.0, 4.0, 4.0));
    NSLog (@"%f %f %f %f", glkm.m00, glkm.m01, glkm.m02, glkm.m03);

    1.000000 1.000000 1.000000 1.000000

  2. Then we create a SCNMatrix4 with the first four numbers set to the first column in the GLKMatrix4, convert that to a GLKMatrix4, and then we print the result:

    SCNMatrix4 scn = SCNMatrix4Identity;
    scn.m11 = 1.0;
    scn.m12 = 1.0;
    scn.m13 = 1.0;
    scn.m14 = 1.0;
    
    GLKMatrix4 glkm2 = SCNMatrix4ToGLKMatrix4(scn);
    NSLog (@"%f %f %f %f", glkm2.m00, glkm2.m01, glkm2.m02, glkm2.m03);

    1.000000 1.000000 1.000000 1.000000

praeclarum commented 2 years ago

Thank you for the analysis! I did mean to do some C/Swift examples to prove out the row/column ordering. Yeah, I’m pretty sure they’re column-major also, but I didn’t have facts.

b) SCNMatrix4Mult reverses the argument, and does a BA multiplication instead of AB.

Well that’s terrifying!!! Oh dear…

As you discovered, experiments do seem to be the only way to prove this stuff out. I think the SCNNode tests are the most important. Once you get Node working, it’s pretty easy to get the matrix self-consistent.

I wrote the next section without fully noting that you said you’re on Step 1 of solving this bug. I’ll leave it here but I understand you were just trying to answer 1 of the questions.

Old Edit

I was more concerned with he construction methods (rotation, scale, translation) and less about the ordering since the ordering can be verified pretty easily:

  1. Create an SCNNode.
  2. Set it’s Position to something interesting
  3. Check which values of the matrix are set using their addresses

That procedure will pretty clearly tell you the memory order of the translation elements.

From there, we can decide if a left side vector multiplier or a right side multiplier:

  1. Create a scale node
  2. Create a translation node
  3. Add the translation node as a child of the parent node. Child nodes’ transforms are composed with the parent.
  4. Note the “world transform” of the child node - the composed transform.

The nice part of these tests is it allows you to answer these questions independent of our binding - Apple is dealing with the matrix 100% themselves. But their work can easily be read out of the class using the Position, Scale, Orientation properties.

Thanks for taking the deep dive. Can’t wait for this to get fixed!

rolfbjarne commented 2 years ago

@praeclarum I have a PR up that should fix this: #15160, could you review to see if it makes sense what I did?

rolfbjarne commented 2 years ago

I hope I got it right this time!