glm::lookAt equivalent?

sgorsten / linalg

linalg.h is a single header, public domain, short vector math library for C++

The Unlicense

854 stars 68 forks source link

glm::lookAt equivalent? #29

Closed ghost closed 3 years ago

ghost commented 3 years ago

n00b question: I use the factory functions a lot. My math skills aren't great. How would I go about look-at functionality in different coordinate spaces (OpenGL, Vulkan, Directx, etc.)?

EDIT: Apparently, the function in question does a transform from world space "into the specific eye space that the projective matrix functions...are designed to expect". Hmm. A more generalized function might be better.

ghost commented 3 years ago

Direct translation from GLM:

template<class T> linalg::mat<T,4,4> lookAt(
        linalg::vec<T,3> eye,
        linalg::vec<T,3> center,
        linalg::vec<T,3> up ){
    linalg::vec<T,3> f = linalg::normalize(center - eye);
    linalg::vec<T,3> u = linalg::normalize(up);
    linalg::vec<T,3> s = linalg::normalize(linalg::cross(f, u));
    u = linalg::cross(s, f);

    return {
        { s.x, u.x, -f.x, 0 },
        { s.y, u.y, -f.y, 0 },
        { s.z, u.z, -f.z, 0 },
        { -linalg::dot(s, eye), -linalg::dot(u, eye), linalg::dot(f, eye), 1 }
    };
}

The above works as intended (with linalg.h default settings) but I'm still not sure if I understand it. Apprently, I'm not the only one. 1 2

I assume user confusion is the reason for excluding the function from linalg.h. So, I ponder what kind of lookAt-like function might be more fitting to the design of linalg.h, since pointing something at something else is a common thing to want to do.

sgorsten commented 3 years ago

Hey, thanks for your interest, and for doing the legwork to get things started.

The main reason why linalg.h doesn't yet have a glm::lookAt(...) equivalent is that one hasn't been requested yet. The library arose several years back out of discussions between a handful of developers I've tended to collaborate with, and we would mostly describe things like cameras in terms of a position vector and an orientation quaternion, or perhaps pitch and yaw scalars, from which an orientation quaternion can easily be constructed. Given these quantities, a local-to-world transform matrix can be constructed via the pose_matrix(...) function, and a view matrix is simply the inverse thereof.

The second reason why the factory functions in linalg.h are a little anemic, and have carried the comment

// Factory functions for 3D spatial transformations (will possibly be removed or changed in a future version)

for nearly five years now, is that the form of view and projection matrices vary quite a bit based on your coordinate system conventions (left/right-handed, y-up/y-down, z-forward/z-back, normalized device coordinate ranges, etc), and I've yet to come up with a construction that feels clean and general, which would apply equally well across traditional and modern OpenGL, Metal/DirectX, Vulkan, etc.

That said, there's no fundamental reason that linalg.h shouldn't include a lookat_matrix(...) of some sort, and perhaps even a lookat_quat(...) to construct a rotation quaternion from a forward vector and an up vector.

sgorsten commented 3 years ago

By the way, just for fun, I think it can be fairly easily shown that:

{
  { s.x, u.x, -f.x, 0 },
  { s.y, u.y, -f.y, 0 },
  { s.z, u.z, -f.z, 0 },
  { -linalg::dot(s, eye), -linalg::dot(u, eye), linalg::dot(f, eye), 1 }
};

is equivalent to:

inverse(mat<T,4,4>{{s, 0}, {u, 0}, {-f, 0}, {eye, 1}});

That is, the view matrix is the inverse of a "pose matrix" for the camera. Within that matrix s, u, and -f are the direction vectors that the camera's "local x", "local y", and "local z" axes are pointing, and eye is the position of the camera's local origin.

Given that s, u, and -f are known to be an orthonormal basis, and we have zeroes and ones in specific places, there's a known efficient form for the inverse, which is why the typical "lookat matrix" has a simple, if not exactly easily understandable, form.

If I were to add a lookat_matrix(...) to linalg.h, it would probably use a form similar to the above, with some conditionals in the appropriate places, to support y-up/y-down and z-back/z-forward conventions.

ghost commented 3 years ago

Thanks for the clarification!

the form of view and projection matrices vary quite a bit based on your coordinate system conventions (left/right-handed, y-up/y-down, z-forward/z-back, normalized device coordinate ranges, etc), and I've yet to come up with a construction that feels clean and general, which would apply equally well across traditional and modern OpenGL, Metal/DirectX, Vulkan, etc.

There must be a "right way" to do it. People praise Vulkan for its use of coordinates. (+Y is down, z-buffer is 0-1, etc.) OpenGL birthed GLM, and then there's "old" DirectX and "new" DirectX, which AFAIK work differently starting with DX12. Not that the "right way" can be imposed.

perhaps even a lookat_quat(...) to construct a rotation quaternion from a forward vector and an up vector.

Rotations are the hard part. If you start with lookat_quat and assign a position to the last vector of the matrix, or use the pose_matrix function, you're done. Of course, an intermediate quaternion step would be slightly less efficient. Could be used for quaternion interpolation, smoothly shifting gaze from one target to another.

Your simplification of the function makes sense to me except where you omit dot products. I don't have a college math background; I guess the omission has to do with orthonormality.

The conditionals you mentioned are a big step towards generality, but you mentioned other differences between APIs. Is there any graphics API that the factory functions currently don't work with?

By the way, small is good. Lobster uses bits of this library, I'm using the library in an upcoming commercial game, and you've got almost 600 stars on it. People clearly like it.

sgorsten commented 3 years ago

Basically, if you have a 4x4 matrix of the form:

|   M    v |
| 0 0 0 1 |

where M is a 3x3 submatrix, and v is a 3-element column vector, then you can invert the 4x4 matrix by replacing M with the inverse of M, and replacing v with the matrix product of the inverse of M and -v. Then, if you further know that M is orthonormal (as all proper rotation matrices are), its inverse is simply its transpose.

In my "make a pose matrix and invert it" formulation above, we use s, u, and -f to build the first three columns of our transformation matrix, and we use eye to build the final column. In the "build the lookat matrix directly" formulation, s, u, and -f are used to fill the ROWS of the matrix (hence, forming the transpose of the top left 3x3 portion) and the dot products are essentially inlining the multiplication of that transposed matrix by -eye, except that of course in the z-component the -eye and the -f cancel out and we just take the dot product of eye and f.

Anyway, yeah, I tend to be of the opinion that Vulkan did things almost right, by choosing an x-right, y-down, z-forward system. The one choice that I personally would have made differently would be to set normalized device coordinates as going from 0 to 1 for all three axes, instead of going from -1 to 1 for x and y and 0 to 1 for z. A hypothetical rendering API which would use 0 to 1 for its normalized device coordinates would be able to use the same projection matrices for rendering as for sampling from projective textures (shadow maps, projectors, etc.), as texture coordinates are quite uncontroversially defined from 0 to 1 in every API under the sun.

At any rate, I might take a little time over the weekend and see if I can come up with a reasonably general and satisfactory implementation of lookat_quat(...) and lookat_matrix(...).

ghost commented 3 years ago

Unable to log into GH or reply from webmail, so sending from my phone.

Most APIs are -1 to 1 NDC because it's a simpler perspective divide, albeit less correct.

Your dot product explanation is over my head, but thanks for trying! :)

Small is good. No need to bloat the library. There's also glm::ortho etc. Where does it end?

On December 16, 2020 7:15:17 PM PST, Sterling Orsten notifications@github.com wrote:

Basically, if you have a 4x4 matrix of the form:
|   M    v |
| 0 0 0 1 |
where M is a 3x3 submatrix, and v is a 3-element column vector, then you can invert the 4x4 matrix by replacing M with the inverse of M, and replacing v with the matrix product of the inverse of M and -v. Then, if you further know that M is orthonormal (as all proper rotation matrices are), its inverse is simply its transpose.

In my "make a pose matrix and invert it" formulation above, we use s, u, and -f to build the first three columns of our transformation matrix, and we use eye to build the final column. In the "build the lookat matrix directly" formulation, s, u, and -f are used to fill the ROWS of the matrix (hence, forming the transpose of the top left 3x3 portion) and the dot products are essentially inlining the multiplication of that transposed matrix by -eye, except that of course in the z-component the -eye and the -f cancel out and we just take the dot product of eye and f.

Anyway, yeah, I tend to be of the opinion that Vulkan did things almost right, by choosing an x-right, y-down, z-forward system. The one choice that I personally would have made differently would be to set normalized device coordinates as going from 0 to 1 for all three axes, instead of going from -1 to 1 for x and y and 0 to 1 for z. A hypothetical rendering API which would use 0 to 1 for its normalized device coordinates would be able to use the same projection matrices for rendering as for sampling from projective textures (shadow maps, projectors, etc.), as texture coordinates are quite uncontroversially defined from 0 to 1 in every API under the sun.

At any rate, I might take a little time over the weekend and see if I can come up with a reasonably general and satisfactory implementation of lookat_quat(...) and lookat_matrix(...).

-- You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: https://github.com/sgorsten/linalg/issues/29#issuecomment-747176986

TestingPlant commented 3 years ago

I've made a pull request implementing lookat_matrix here.

sgorsten commented 3 years ago

w.r.t: the simpler perspective divide: It isn't.

OpenGL is the outlier using -1 to +1 for the z-axis. DirectX, Metal, and Vulkan all use 0 to 1, and the projection matrix math is simpler when using that range. OpenGL even introduced https://docs.gl/gl4/glClipControl to allow users to opt into the zero-to-one convention. There are also accuracy advantages to using an inverted zero-to-one range, using 1 for the near clip plane and 0 for the far clip plane, at least when using floating point depth buffers, and this can have a big advantage in the quality of shadow mapping techniques, etc.

If you go ahead and work out the math for using a zero to one range in the x and y axes, it turns out the projection matrix gets even simpler than before. Using zero to one on all three axes results in the fewest total number of arithmetic operations to compute frustum matrices, orthographic matrices, etc. While no major GPU rendering API uses a zero-to-one cube for normalized device coordinates, they ALL use zero-to-one ranges for texture coordinates, which means if you're ever doing any sort of projective texturing in a shader, you've already had to construct, explicitly or implicitly, the projection math for a zero-to-one range. If some hypothetical future API were to at least allow you to opt into a zero-to-one cube for normalized device coordinates, your projection math for render targets and for textures would be identical.

sgorsten commented 3 years ago

For the time being, I've added a lookat_quat(...), with a slightly different form to what you'd expect from GLM or GLU.

Instead of an up vector, it takes a view_y_dir vector. This is the direction in the world which corresponds to the positive y direction of your normalized device coordinates.

For a Metal or DirectX app or a y-up OpenGL app, set view_y_dir to your world's up axis, e.g., {0,1,0} for a y-up world or {0,0,1} for a z-up world.
For a Vulkan app or a y-down OpenGL app, set view_y_dir to your world's DOWN axis, which might be {0,1,0} for a y-down world, or {0,0,-1} for a z-up world.

Additionally, lookat_matrix takes a linalg::fwd_axis parameter, similar to frustum_matrix(...) and perspective_matrix(...). If this is set to linalg::neg_z (the default), then the z-axis of your view space will point backwards from the center point to the eye point. If this is set to linalg::fwd_z, then the z-axis of your view space will point FORWARDS from the eye point to the center point.

ghost commented 3 years ago

The commit works great. view_y_dir is simple enough. Looking forwards vs looking back might require some documentation, but otherwise no complaints.