vpenades / SharpGLTF

glTF reader and writer for .NET Standard
MIT License
470 stars 75 forks source link

Struggling with Invalid Matrix #41

Closed ptasev closed 4 years ago

ptasev commented 4 years ago

I've been struggling with matrices lately, and I'm wondering whether Matrix4x4.Decompose is too strict or the matrix is truly poorly formed.

I'm using the SceneBuilder API to put together a node hierarchy. For each node, I set the local matrix like so:

var nb = nodeBuilder.CreateNode(bone.Name);
nb.UseTranslation().Value = bone.Position;
nb.UseRotation().Value = bone.Rotation;
nb.UseScale().Value = bone.Scale;

Then, I create a mesh, and calculate the inverse bind matrices:

var nb = nodeBuilders[bb.BoneIndex];
var wm = nb.WorldMatrix;
Matrix4x4.Invert(wm, out Matrix4x4 wmi);
wmi.M44 = 1; // force to be one
return (nb, wmi);

Note: Somehow M44 becomes 0.999999, which is probably due to a precision error caused during the Invert method, and the gltf validator doesn't like this so I forced it to a 1.

Now when I save the gltf file, I get a complaint that a matrix is invalid. I commented out the part that throws the exception just to force the file to save and I can inspect it on my own.

I believe this is due to the Matrix4x4.Decompose function returning false. I think it returns false because the determinant of the rotation part of the matrix is <0.985 as opposed to 1.0 and that's enough to fail the "DecomposeEpsilon" threshold. When I use the gltf validator provided by Khronos, it says the file is valid.

I guess the question is whether these matrices are good enough and the Decompose method is too strict. I also wonder where the imprecison comes from. The local matrices are created from proper position, rotation, and scale data. Maybe if the hierarchy is deep enough, then computing the world matrix loses precision after so many local matrices are multiplied together.

I just wanted to get your thoughts on this, and what we can do about this. I have attached an example gltf. grnGltf.zip

Thanks!

I also posted a similar issue in the dotnet repo where Decompose returns false in an even simpler example. I was hoping someone could explain how precision was lost, and what to do about it. Initially I thought it was a bug, but now I think either the threshold is too strict, or there are precision problems. https://github.com/dotnet/runtime/issues/34874

vpenades commented 4 years ago

yes, this is a painful issue.

In my experience, there's a number of vectorial operations that do loose a lot of precission, and Matrix4x4.Invert is probably the one that looses the most precission.

Also, Quaternion multiplication is another one that looses a lot of precission. So I keep renormalizing it after almost every operation.

Another thing that might be affecting the precission is scaling; if the hierarchy root matrix has a scale of, for example, 0.01, it drags that imprecission down the whole hierarchy, making the precission loss more apparent.

Making a custom Decompose with a loose threshold is sketchy at best.... which threshold would you choose? 0.95? maybe next week someone comes asking to lower it even more because it's failing to him. Also, glTF spec is becoming surprisingly strict in terms of precission when it comes to skinning (see discussion about Byte quantized vertex skin weights not having ONE sum) ... so I suspect the whole skin pipeline needs to be quite accurate.

As you guessed, the imprecission comes from the chain of matrix transforms to compute world matrices down the hierarchy. The world matrices have a tendency to "skew".

So, there's a number of potential solutions:

  1. Use a custom Matrix4x4 using Doubles to improve the precission; This will produce more accurate world and inverse matrices that will closely match those used by a game engine.
  2. Try to implement multiplication and inverse in Transforms.AffineTransform, and do the hierarchy transform using quaternions instead of matrices. This allows renormalizing the quaternion at every step.
  3. Try to "renormalize" the matrix before doing the inverse or the decompose. There's a number of algorythms to do that, like Gram–Schmidt renormalization.

Solutions 2 and 3 do a tradeoff between skewing and drifting. A renormalized quaternion or a matrix will allow Invert and Decompose to pass, but it will "drift away" from its theorically correct value as you go down the hierarchy.

Solution 1 is probably the best one, but it would require to roll down a full Matrix4x4 class, and let some parts of the library use it instead of the default single precission one.

Solution 3 is probably the fastest one to implement, because it only requires hacking the matrix in a few spots. It is also tricky because renormalizing even extremely skewed and invalid matrices would make them valid.

vpenades commented 4 years ago

I've wrote this matrix normalization method

You can try to use it in inverse bind matrices when you create the skins.... let me know if it works well (and if it the animations look fine, and the gltf models are exported correctly)

If it solves the problem, I'll integrate it in the library.

ptasev commented 4 years ago

Thanks, there's certainly a lot to think about. Right now I'm testing just the world matrices with the node hierarchy in that model I attached. Even the world matrices without inverting already cause Decompose to return false.

I added your normalization function to AffineTransform.LocalToWorld (normalize after the multiplication before returning). This improved things, but I still got a world matrix (head eye left bottom) that failed to Decompose. (it was off by the tiniest fraction from the threshold of the Decompose function, 0.98574XX)

A much simpler example is to try to normalize the inverse matrix from the example I posted in the dotnet runtime issue I opened. For that single matrix that I inverse, even after your normalize function it makes no difference at all. This makes me think that there's just inherent precision loss no matter what, and we have no choice but to go to double for matrix multiply and inverse.

I wonder how games solve this problem when creating the world matrix.

vpenades commented 4 years ago

Keep in mind the validation is performed on stored matrices within the gltf document, not on matrices calculated after them.

As far as I remember, the only two places where glTF stores matrices are:

It's these matrices the ones that need to comply with decomposability, not what's calculated after them. So, the idea is to use the Normalize in only these matrices before setting them to SceneBuilder.

In particular, the inverse bind matrix needs to be normalized at the end of the process:

IBM = Inverse( Local Local Local Local Local ); IBM = Normalize( IBM );

About the normalization method:

Internally, it uses vector normalization. Usually, vector normalization is more accurate when the vector has a modulus close to 1. If the source vector's length is very small, or very large, the vector normalization will result in a "normalized" vector with a modulus that is not exactly 1. This will affect the calculations of the normalization.

Finally, I've been thinking whether I should be less strict and align with gltf-validator; That would be fine if I would be using my own maths library. But since SharpGLTF relies heavily on System.Numerics.Vectors, I must play by its rules too.

ptasev commented 4 years ago

I realize that only those values get stored in the GLTF, but I'm still curious about how renderers solve the issue.

I managed to make it work as you suggested. (normalize the IBM) However, I first had to fix a bug in your function. The bug is that you are calculating the length of the columns, when you should be doing the length of the rows since we're using a row-major formatting of our matrices. While with the bug it still produced correct results, the length became unusually large or small making more loss of precision than necessary.

While this worked, I still think that there may be a case that a matrix doesn't get normalized properly.

Update: Actually, after testing the normalized version of the saved file, some of the bone positions were completely off. :S

Update2: Maybe we shouldn't be normalizing the matrix. Instead we can make an alternate validation to the Decompose function by getting the inverse, and then multiply the regular and inverse and see how close it is to Identity.

        void NormalizeMatrix(ref Matrix4x4 xform)
        {
            var vx = new Vector3(xform.M11, xform.M12, xform.M13);
            var vy = new Vector3(xform.M21, xform.M22, xform.M23);
            var vz = new Vector3(xform.M31, xform.M32, xform.M33);

            var lx = vx.Length();
            var ly = vy.Length();
            var lz = vz.Length();

            // normalize axis vectors
            vx /= lx;
            vy /= ly;
            vz /= lz;

            // determine the skew of each axis (the smaller, the more orthogonal the axis is)
            var kxy = Math.Abs(Vector3.Dot(vx, vy));
            var kxz = Math.Abs(Vector3.Dot(vx, vz));
            var kyz = Math.Abs(Vector3.Dot(vy, vz));

            var kx = kxy + kxz;
            var ky = kxy + kyz;
            var kz = kxz + kyz;

            // we will use the axis with less skew as our fixed pivot.

            // axis X as pivot
            if (kx < ky && kx < kz)
            {
                if (ky < kz)
                {
                    vz = Vector3.Cross(vx, vy);
                    vy = Vector3.Cross(vz, vx);
                }
                else
                {
                    vy = Vector3.Cross(vz, vx);
                    vz = Vector3.Cross(vx, vy);
                }
            }

            // axis Y as pivot
            else if (ky < kx && ky < kz)
            {
                if (kx < kz)
                {
                    vz = Vector3.Cross(vx, vy);
                    vx = Vector3.Cross(vy, vz);
                }
                else
                {
                    vx = Vector3.Cross(vy, vz);
                    vz = Vector3.Cross(vx, vy);
                }
            }

            // axis z as pivot
            else
            {
                if (kx < ky)
                {
                    vy = Vector3.Cross(vz, vx);
                    vx = Vector3.Cross(vy, vz);
                }
                else
                {
                    vx = Vector3.Cross(vy, vz);
                    vy = Vector3.Cross(vz, vx);
                }
            }

            // restore axes original lengths
            vx *= lx;
            vy *= ly;
            vz *= lz;

            xform.M11 = vx.X;
            xform.M21 = vy.X;
            xform.M31 = vz.X;

            xform.M12 = vx.Y;
            xform.M22 = vy.Y;
            xform.M32 = vz.Y;

            xform.M13 = vx.Z;
            xform.M23 = vy.Z;
            xform.M33 = vz.Z;
        }
vpenades commented 4 years ago

You're right, it should use Rows instead of Columns for this case, it's specially apparent in this example using an uneven scaling:

var SxR = Matrix4x4.CreateScale(5, 1, 1) * Matrix4x4.CreateFromYawPitchRoll(1, 2, 3);
AffineTransform.NormalizeMatrix(ref SxR);
Matrix4x4.Decompose(SxR, out _, out _, out _)

Update: Actually, after testing the normalized version of the saved file, some of the bone positions were completely off. :S

Can I see the saved file?

I think the precission is lost 50% in the world matrix concatenation chain, and 50% in the world matrix inverse alone. Keeping in mind that this calculation is needed ONLY to calculate the IBM, it makes sense to write a special pipeline with enhanced precission to calculate the IBMs.... once the IBMs have been baked with good enough precission, any engine should be able to consume them just fine.

Also I suspect many engines don't have this issue because their models have been exported at scales close to 1, so there's not much performance loss (which doesn't imply that this should be the solution)...

ptasev commented 4 years ago

I have attached the gltf file with the weird bones.

If the bones have animations then you'd need to calculate the world matrix again at each time. That means that a renderer would need good precision for world matrix too. From the perspective of the library, even though you're not saving the world matrix, someone unaware of the issue might use them to incorrectly create IBM.

grnGltfNormalizedIBM.zip

vpenades commented 4 years ago

when running in real time, the matrix used by the skin shader looks something like this:

Bone17.World = ( Bone0.Local * Bone5.Local * Bone16.Local * Bone17.Local );
Bone17.SkinX = Bone17.IBM * Bone17.World

So there's a concatenation chain of matrix multiplications, plus the IBM multiplication.... but at least, there's no Matrix4x4.Invert anywhere in the runtime pipeline. (The IBM is stored as the invert of the BM precisely to avoid having to calculate the inverse in realtime, which is expensive)

So if the IBMs are accurately calculated, engines should be fine.

From the perspective of the library, I believe the right thing to do is to try to "play nice" and always try to write data that can be used in a number of calculations without throwing exceptions, and that means ensuring that all matrices being written to a glTF can be decomposed and inverted using the de-facto standard in c#, which is System.Numerics.Vectors

Afterwards, it's up to the engines to choose appropiate strategies to ensure calculations can be done without throwing errors.

If the IBM normalization trick does not solve the issue, I believe there'll be no choice other than to write a custom Double4x4 matrix..

ptasev commented 4 years ago

I think without normalization we get a better matrix for the sake of making the right computations. If you take a world matrix, and invert it, then multiply them together the resulting matrix will be pretty close to identity. If you normalize the invert, and do the computation, then the matrix will be further away from Identity.

I assume this is because world matrix itself is not normalized. And since we don't store the world matrix in gltf, there's nothing we can do about it. Therefore, we can't fix the IBM without fixing the world. So it doesn't seem to make sense to normalize the IBM.

I think the validation for IBM should be whether or not when multiplied with the world matrix, it gets close enough to Identity. Local matrices can still be checked to make sure they can be decomposed.

vpenades commented 4 years ago

Unfortunately it's not that simple.

As we've been discussing, the IBMs are, usually, the inverse of the world matrix of a given node. So in these cases, it is true that multiplying the IBMs by the world matrices would give you the identity matrix.

But this is only true for, maybe, 80% of all the glTF models you'll find around.

For the other 20%, the local matrices of their nodes were modified after skin binding (there's a number of valid reasons to do that), so the World Matrices you can compute from the stored local matrices no longer match the IBMs, they would not give you the identity if multiplied.

I've wrote one of the workflows that cover this scenario here: https://github.com/KhronosGroup/glTF/issues/1665#issuecomment-529828370

Think about it... if the IBMs would be the inverse of the world matrices in all cases, there would be no need to store the IBMs in the file, because any engine could be able to calculate the IBMs on load. They're stored in the glTF precisely for those cases in which the IBM <-> World Matrix relationship is lost due to changes in the hierarchy after the skin has been bound.

So, validating the IBMs by checking them against the world matrices is not possible because that.

That's also the reason why SceneBuilder has a AddSkinnedMesh that allows you to define IBMs explicitly. In fact, I am considering adding an overload for AddSKinnedMesh to determine at which time do you want the IBMs to be calculated (with two options: Immediately | AtExportTime), the subtle difference is that "immediately" would freeze the IBMs for that skin, but would allow you to modify the NodeBuilder transforms afterwards, and "AtExportTime" would delay the skin binding until the end, just before you create the glTF.

ptasev commented 4 years ago

Interesting. I didn't realize that scenario existed. Well, I'm not sure what to do then. I just know normalizing doesn't give the desired effect, and it looks fine if I just bypass the validation even if some matrices aren't orthogonal.

On a side note, the dotnet team is considering adding double to Numerics in the BCL: https://github.com/dotnet/runtime/issues/24168

vpenades commented 4 years ago

Calculating the IBMs is done by the glTF library, so I believe it's important to calculate it as accurately as possible, so I'm working on a IBM calculation path that will use doubles, I hope this will mitigate the problem to some degree.

This will imply some more APIs and classes, while avoiding to break the current API... so it will take a while.

On a side note, the dotnet team is considering adding double to Numerics in the BCL: dotnet/runtime#24168

I am aware of the proposal to add vectors in doubles, and I believe UWP already has some support, but I don't expect any release for a while.... and I would not like to have an additional dependency in the library just for one class... so I'll probably roll my own Double4x4 based on Matrix4x4 code.

ptasev commented 4 years ago

I'm looking forward to test when it's ready. Also, I wonder if writing your own inverse function would be helpful since you know the nature of the matrices that you're working with.

vpenades commented 4 years ago

I've commited some changes: some internal code related to world matrix now uses a Matrix4x4Double, which includes matrix multiplication and inverse. They're essentially the same code as System.Numerics.Matrix4x4, but with doubles.

let me know if it works

ptasev commented 4 years ago

I'm still getting an invalid matrix on save. Here's the code I'm using:

                var joints = mesh.BoneBindings.Select(bb =>
                {
                    var nb = nodeBuilders[bb.BoneIndex];
                    return (nb, nb.GetInverseBindMatrix(null));
                }).ToArray();

I did an extra test, and this exception gets thrown too:

        public Matrix4x4 GetInverseBindMatrix(Matrix4x4? meshWorldMatrix = null)
        {
            Transforms.Matrix4x4Double mwx = meshWorldMatrix ?? Matrix4x4.Identity;

            var b = Matrix4x4.Decompose((Matrix4x4)this.WorldMatrixPrecise, out Vector3 _, out Quaternion _, out Vector3 _);
            if (!b) throw new Exception();

            return (Matrix4x4)Transforms.SkinnedTransform.CalculateInverseBinding(mwx, this.WorldMatrixPrecise);
        }
vpenades commented 4 years ago

Hmmm... the new GetInverseBindMatrix calculates the whole transform chain using doubles. Could it be possible for you to dump all the local matrices of the joints involved?

I guess you're writing a format X to glTF converter.... maybe the transforms being imported in the format X are wrong?

Keep in mind that WorldMatrixPrecise is just a multiplication of all the local matrices, using double precission.... so if the local matrices are fine, it's hard to believe the world matrix would be invalid... so maybe the local matrices are invalid to begin with? could you check with the decomposable the local matrices from your source?

ptasev commented 4 years ago

I did check to make sure all my local matrices are valid with that same decompose check and they all worked fine. I'll see about getting you the data, but it should all be in the gltf in the first post.

I think the easiest method to get my hierarchy is open the gltf from the OP, and disable matrix checks to be able to load it. Then you can have the full node hierarchy with my local matrices.

vpenades commented 4 years ago

I also suspect the problem is, indeed, with the System.Numerics.Decompose .... it's not only very strict, but it also fails in some cases, for example:

var SxR = Matrix4x4.CreateScale(5, 1, 1) * Matrix4x4.CreateFromYawPitchRoll(1, 2, 3);   
Matrix4x4.Decompose(SxR, out _, out _, out _); // Succeeds!

var RxS = Matrix4x4.CreateFromYawPitchRoll(1, 2, 3) * Matrix4x4.CreateScale(5, 1, 1);
Matrix4x4.Decompose(RxS, out _, out _, out _); // Fails!

I really need to check the data, to see why it's failing, and whether the problem lies in the data, the transform chain, or the decompose method, in which case, as you suggested in the beginning, I'll have no other choice than to remove it, which is something I dislike because there's some parts of the library that depend on matrix decomposition.

ptasev commented 4 years ago

Yeah it baffles me how examples like yours are failing at all. Your example is similar to the one I posted in the issue in the dotnet runtime that I linked in my OP.

Worst case you can make your own decompose method. We should still investigate where the loss in coming from though.

vpenades commented 4 years ago

I believe the decompose is failing because in the example above, SxR remains orthogonal, while RxS becomes skewed. and Decompose is unable to work on skewed matrices.

There's some discussion about whether glTF should support skewed matrices here and here (link fixed)

But I am beginning to believe that decomposability is only a requirement of Node Local Matrices. And requiring decomposability on IBMs is simply not possible because they can indeed, be skewed.

ptasev commented 4 years ago

Edit: I just ran through a lot of different scenarios, and it doesn't seem like the double version really helps with the issue. I also tried an implementation that completely skipped matrices, and concatenated the world transform using the T, R, and S separately. Also didn't make a difference. I don't know what the solution is.

I am curious about gltf validator's current IBM validation though. I wonder what their limits are.

vpenades commented 4 years ago

The only scenario that makes sense to me is that some local transforms have uneven scaling, which produces skewed transforms down the hierarchy chain. Given that this would be a valid scenario, it means that Local matrices should be decomposable, but World matrices might not.

And given that IBMs are the inverse of world matrices, it means that IBMs should meet the "invertible" validation, but not neccesarily the "decomposable" validation.

I've just commited some changes that updates the validation, so IBMs only validate inversion.

ptasev commented 4 years ago

Yes, I think you're right. I just wish I could find something that explains this mathematically in the context of TRS matrices. I also realize now that the columns/rows of orthogonal matrices are not just orthogonal, but also orthonormal so I've been using that term wrong, and scaled matrices are not orthogonal.

Anyway, I think this is the correct solution. I don't think the double matrix4x4 made much of a difference in the end and can probably be removed.

By the way, here's my implementation of skipping matrices, and using only SRT parts to create a world matrix. The code is really ugly since I put it together really fast for my testing. https://gist.github.com/ptasev/bce22ca61d9da83a702c2011229b76ab

vpenades commented 4 years ago

I understand the difference between SxR and RxS is that in the latter, the uneven scaling is applied to the axes on an already unaligned matrix (because it was previously rotated) and that produces a skewed matrix. It is not possible to do with a single local matrix. But it is possible to do with a rotated parent node, and an unevenly scaled child node.

I also realize now that the columns/rows of orthogonal matrices are not just orthogonal

Yes, that's why my original code for matrix normalization used columns instead of rows. But I've made some further checks, and for some reason, it only works well if applied to rows.

I don't think the double matrix4x4 made much of a difference in the end and can probably be removed.

It doesn't hurt either, and I believe it can be useful to improve precission with long hierarchy chains, so I'll probably leave it.

By the way, here's my implementation of skipping matrices...

That class looks very much like Transforms.AffineTransform, maybe I could add that functionality to it.