immersive-web / hit-test

https://immersive-web.github.io/hit-test/
Other
78 stars 31 forks source link

XRRay(transform) constructor doesn't make conceptual sense #82

Closed Manishearth closed 4 years ago

Manishearth commented 4 years ago

Initialize ray’s origin to { x: 0.0, y: 0.0, z: 0.0, w: 1.0 }.

Initialize ray’s direction to { x: 0.0, y: 0.0, z: -1.0, w: 0.0 }.

Transform ray’s origin by premultiplying the transform’s matrix and set ray to the result.

Transform ray’s direction by premultiplying the transform’s matrix and set ray to the result.

The origin part makes sense; this just sets the origin to transform.position, but the direction part doesn't. The direction part will basically give you the final position of (0, 0, 1) when transformed by the rigid transformation, which in our ray model is conceptually position + direction, assuming the ray we're looking for is the ray that the z axis of the XRRigidTransform's natural coordinate axes point towards.

Instead we should just rotate (0, 0, -1) by transform.orientation and call it a day.

Furthermore, Chromium isn't actually following this code:

void XRRay::Set(const TransformationMatrix& matrix,
                ExceptionState& exception_state) {
  FloatPoint3D origin = matrix.MapPoint(FloatPoint3D(0, 0, 0));
  FloatPoint3D direction = matrix.MapPoint(FloatPoint3D(0, 0, -1));
  direction.Move(-origin.X(), -origin.Y(), -origin.Z());

  Set(origin, direction, exception_state);
}

What chrome is doing is conceptually the same as "rotate by transform.orientation and call it a day"), except with extra steps, since matrix * (0, 0, -1) is equal to translation * rotation * (0, 0, -1), and they're just cancelling out the translation in the next step.

bialpio commented 4 years ago

Instead we should just rotate (0, 0, -1) by transform.orientation and call it a day.

This is exactly what we're doing - we're multiplying a 4x4 matrix by [0,0,-1,0] vector - the only part of the rigid transform that matters here is the part that represents rotation (translation is not applied since w = 0). It's actually selecting a single column from the rigid transform matrix and then negating it (mathematically, M * [0, 0, -1, 0]^T does exactly that), so we could simplify the algorithm to take it into account, but we might as well be general here.

Chrome's code seems ok to me, but I wrote it so of course I'd say that. :)

Let's unpack it step by step: FloatPoint3D origin = matrix.MapPoint(FloatPoint3D(0, 0, 0)); The above performs the Transform ray’s origin by premultiplying the transform’s matrix and set ray to the result. part of the spec - looking at MapPoint implementation, both translation and rotation gets applied to the origin but the rotation has no effect on (0,0,0) so it could be simplified to just "set the origin to the translation of the rigid transform".

FloatPoint3D direction = matrix.MapPoint(FloatPoint3D(0, 0, -1));
direction.Move(-origin.X(), -origin.Y(), -origin.Z());

Since TransformationMatrix class has no MapVector function (or equivalent), the second step of the spec text is done differently. Direction gets mapped through the rigid transform matrix (this also applies both rotation and translation), but it makes no sense to apply translation to a vector (it is a direction, after all), so the translation that happened needs to be reversed. Another way to look at it is to interpret direction vector as d - origin where d and origin are points, with d set to x,y,z of the direction, then map those 2 points through the matrix, M*d = d', M*origin = origin' - then, the mapped direction is d' - origin'.

Manishearth commented 4 years ago

Ah, I was treating both of these as 3-component vectors, not 4-component vectors.

Would it be worth changing the spec to directly say that origin is just the position and direction is (0, 0, -1) rotated by orientation? That seems conceptually simpler to me.

bialpio commented 4 years ago

We could definitely simplify the phrasing for the origin, but we'd have to either introduce the algorithm to rotate something by a quaternion, or maybe say that the direction is negated Z axis of the coordinate system defined by the rigid transform's 3x3 sub-matrix (but maybe in a more formal way?).

Manishearth commented 4 years ago

Maybe we should add a non normative note like in https://immersive-web.github.io/webxr/#multiply-transforms