immersive-web / webxr-hand-input

A feature repo for working on hand input support in WebXR. Feature lead: Manish Goregaokar
https://immersive-web.github.io/webxr-hand-input/
Other
104 stars 17 forks source link

marked hand as sameobject + added a clarifying note #93

Closed cabanier closed 3 years ago

cabanier commented 3 years ago

Closes #91


Preview | Diff

alcooper91 commented 3 years ago

IMO, rather than the NOTE we should modify the algorithm, adding a new step 8/10 along the lines of:

foreach |joint| in |jointspaces| : 
  if the user agent cannot determine the |pose| of |joint| return false and stop processing these steps.

The last step becomes simply "return true", and the current 8.3/10.3 step can be removed.

cabanier commented 3 years ago

IMO, rather than the NOTE we should modify the algorithm, adding a new step 8/10 along the lines of:

foreach |joint| in |jointspaces| : 
  if the user agent cannot determine the |pose| of |joint| return false and stop processing these steps.

The last step becomes simply "return true", and the current 8.3/10.3 step can be removed.

Unfortunately, the joints don't all have to belong to the same hand for fillJointRadii For fillPoses they don't even have to be joint spaces.

alcooper91 commented 3 years ago

I see, so the joints in fillJointRadii/fillJointPoses aren't necessarily all of the joints for a hand or all from the same hand; I think it's still better to put an explicit check in the algorithm to call out something along the lines of validating that all other poses from the hand that |joint| belongs to are valid before providing the data (e.g. as an addition to determining if a pose can be provided for the joint); especially since I believe NOTEs like what you've added are non-normative. (Unless that is your intention).

cabanier commented 3 years ago

I see, so the joints in fillJointRadii/fillJointPoses aren't necessarily all of the joints for a hand or all from the same hand; I think it's still better to put an explicit check in the algorithm to call out something along the lines of validating that all other poses from the hand that |joint| belongs to are valid before providing the data (e.g. as an addition to determining if a pose can be provided for the joint); especially since I believe NOTEs like what you've added are non-normative. (Unless that is your intention).

A normative section higher up defines this:

When a hand is partially obscured the user agent MAY emulate the obscured joints, or it MAY report null poses for all of the joints.

So the spec already defines this behavior; it just wasn't obvious.

alcooper91 commented 3 years ago

That's fair, my feedback is mainly based on how much back/forth I had to do through the spec to understand what was happening here when I was attempting to review it this morning to understand the behavior. (e.g. We have one line about it, but then don't reference this requirement in any algorithms about querying or updating the data). Even then, I didn't think that fillJointPoses/fillJointRadii was either incomplete or referring to joints from different hands.

jmajnert commented 3 years ago

IMO, the getJointPose algoritms should be redefined to clearly state, that null should be returned if pose of any joint on the same hand cannot be determined. Then, the two other algorithms should be redefined in terms of getJointPose

cabanier commented 3 years ago
  1. Notes should not introduce normative requirements. We should update the algorithms to do exactly what we want.

There's normative text that states that if one joint has no pose, none of the joints of that hand have a pose. The note is just to clarify that this is implicitly used by the algorithm.

  1. For fillJointRadii - if user can request radii for a subset of joints, should we consider all other (not requested) joints from that hand when deciding whether return NaN or not?

Yes. The normative text defines this:

When a hand is partially obscured the user agent MAY emulate the obscured joints, or it MAY report null poses for all of the joints.

This should be clarified in the algorithm. The same for fillPoses - should we consider all spaces of a hand (even those that are not requested by caller), when deciding whether to return 16 NaNs or valid transforms?

Yes but again, implied by the normative text. If we add sorting by hand to the algorithm, it will become much more complicated.

jmajnert commented 3 years ago

Having read the Editor's Draft version I see it is clearer than the FPWD. Still, I would propose that instead of the Notes we add a specific algorithm step. Something like this perhaps: https://github.com/immersive-web/webxr-hand-input/commit/50f63061e561b450b934602a905cb56f67137e1c

alcooper91 commented 3 years ago

I wanted to comment on these two bits:

A normative section higher up defines this:

When a hand is partially obscured the user agent MAY emulate the obscured joints, or it MAY report null poses for all of the joints.

So the spec already defines this behavior; it just wasn't obvious.

If we add sorting by hand to the algorithm, it will become much more complicated.

I don't think the adding the note here makes it obvious, and I actually strongly disagree that it makes the algorithm more complicated, to me the fact that this was only implicit and not an explicit step added to a lot of confusion when trying to parse the text, as someone not intimately familiar with this spec. I don't think we necessarily need to sort by hand, but joints should have a reference back to their hand that we can use to determine if all other joints on that hand can be located. Essentially, I think we need some explicit algorithm for:

There's normative text that states that if one joint has no pose, none of the joints of that hand have a pose.

Perhaps as a compromise we could add/update a hand update algorithm that sets all poses to null if any pose on the hand is null, that way even if it was theoretically able to locate the pose the UA should have "nulled out" that pose for the frame.

We may even want to strengthen the text to be a bit more explicit and change:

When a hand is partially obscured the user agent MAY emulate the obscured joints, or it MAY report null poses for all of the joints.

To:

When a hand is partially obscured the user agent MUST either emulate the obscured joints, or report null poses for all of the joints.

cabanier commented 3 years ago

So the spec already defines this behavior; it just wasn't obvious. If we add sorting by hand to the algorithm, it will become much more complicated.

I don't think the adding the note here makes it obvious, and I actually strongly disagree that it makes the algorithm more complicated, to me the fact that this was only implicit and not an explicit step added to a lot of confusion when trying to parse the text, as someone not intimately familiar with this spec. I don't think we necessarily need to sort by hand, but joints should have a reference back to their hand that we can use to determine if all other joints on that hand can be located.

They have a reference back to their hand: Every XRJointSpace has an associated hand, which is the XRHand that created it.

Essentially, I think we need some explicit algorithm for:

There's normative text that states that if one joint has no pose, none of the joints of that hand have a pose.

hmm, I think we need to patch populate the pose to do this. I'll make an attempt.

We may even want to strengthen the text to be a bit more explicit and change:

When a hand is partially obscured the user agent MAY emulate the obscured joints, or it MAY report null poses for all of the joints.

To:

When a hand is partially obscured the user agent MUST either emulate the obscured joints, or report null poses for all of the joints.

That sounds good to me. The 2 "may"'s are indeed confusing

jmajnert commented 3 years ago

to me the fact that this was only implicit and not an explicit step added to a lot of confusion when trying to parse the text, as someone not intimately familiar with this spec

+1! I must agree. The algorithms in spec should be painfully explicit. That's what they are for. Changing their behaviour with some context in the prose of the spec that is not immediately available adds confusion and will cause interoperability problems.

cabanier commented 3 years ago

@alcooper91 @jmajnert I hoisted the populate the pose algorithm into the spec and patched it. I'm not 100% happy with it but I can't see any other way since the regular WebXR methods also need to account for this joint behavior.

cabanier commented 3 years ago

Or to modify the WebXR spec to accept extensions? Essentially all you've done is added a line in the middle of the algorithm, so perhaps the main algorithm could be patched with something like:

I think that would be ideal. We did something similar for Layers. Maybe we can submit the current work but file an issue against WebXR at the same time. I believe @toji and @Manishearth are working on getting to CR so we can't make changes right now. Maybe we should discuss if we should fork the spec to level 2 so we can still make changes.

alcooper91 commented 3 years ago

That's fair, and I'll defer to @toji and @Manishearth on that.

Would something like this be possible as a workaround to not have to copy the full text? Or I guess that technically leaves a hole for getting the poses in the reference space...

I definitely agree that it's not ideal. Would it be possible to do something like:

Algorithm: Populate Pose for XRHand
1. Let |pose| be the result of running |Populate the pose| for space in base space
2. If |space| is a {{XRJointSpace}}....
cabanier commented 3 years ago

Would something like this be possible as a workaround to not have to copy the full text? Or I guess that technically leaves a hole for getting the poses in the reference space...

You mean, only patch it for the methods in the hands spec? If so, then it wouldn't apply for the getPose function the WebXR spec.

alcooper91 commented 3 years ago

I see, then from my perspective this is fine, and helps clarify it; but I think there's no explicit modification made to fillJointRadii, which doesn't look to reference the "Populate the Pose" algorithm?

jmajnert commented 3 years ago

Sorry to be cutting in, but have you seen my proposition here: https://github.com/immersive-web/webxr-hand-input/commit/50f63061e561b450b934602a905cb56f67137e1c ?

I think it cheaply (without changes to other specs) solves our problems.

cabanier commented 3 years ago

Sorry to be cutting in, but have you seen my proposition here: 50f6306 ?

I think it cheaply (without changes to other specs) solves our problems.

getPose also needs to be patched

cabanier commented 3 years ago

I see, then from my perspective this is fine, and helps clarify it; but I think there's no explicit modification made to fillJointRadii, which doesn't look to reference the "Populate the Pose" algorithm?

I will make it call that algorithm.

cabanier commented 3 years ago

I see, then from my perspective this is fine, and helps clarify it; but I think there's no explicit modification made to fillJointRadii, which doesn't look to reference the "Populate the Pose" algorithm?

I will make it call that algorithm.

Unfortunately, I couldn't use that so I had to patch it in place. @jmajnert, I reformatted your proposal a bit. Let me know if you prefer your approach.

Manishearth commented 3 years ago

I don't understand why we need to patch the algorithm here; this hook already exists in the form of the definition for native origin: each space defines how its native origin behaves. That's what currently exists in the spec, with the wording of:

The native origin of the XRJointSpace may only be reported when native origins of all other XRJointSpaces on the same hand are being reported. When a hand is partially obscured the user agent MAY emulate the obscured joints, or it MAY report null poses for all of the joints.

except that MAY should probably be a "MUST either" as you've fixed in this PR. We could tighten up the wording to stop saying "reported" here and define the native origin as this only-when-everyone-else-exists quantity.

Manishearth commented 3 years ago

I think a way of clarifying might be to add a note on the populate the pose algorithm in the core spec that native origins may have conditions in which they are or aren't shown and to check their definition.

cabanier commented 3 years ago

I don't understand why we need to patch the algorithm here; this hook already exists in the form of the definition for native origin: each space defines how its native origin behaves. That's what currently exists in the spec, with the wording of:

The native origin of the XRJointSpace may only be reported when native origins of all other XRJointSpaces on the same hand are being reported. When a hand is partially obscured the user agent MAY emulate the obscured joints, or it MAY report null poses for all of the joints.

I agree as well. Patching the whole algorithm is ugly and IMO just as confusing. I will take it out of the PR. The notes should clarify and we should probably also add some example code.

jmajnert commented 3 years ago

The reason why we have this PR is because reviewers of a change on Chromium gerrit found the specification to be self contradicting: https://chromium-review.googlesource.com/c/chromium/src/+/3025151/comments/bfb1865f_07c0e6db Let me quote:

1) https://www.w3.org/TR/webxr-hand-input-1/#xrjointspace-interface States that if not all XRJointPoses can be identified/emulated, then all need to be reported as null. 2) https://www.w3.org/TR/webxr-hand-input-1/#dom-xrframe-filljointradii and https://immersive-web.github.io/webxr-hand-input/#dom-xrframe-fillposes both contradict this, by points 8.3/10.3 respectively, which both should probably read something like "If the pose can't be found, zero out any pre-filled data and abort" (Or adding a pre-cursor step to check for that data and abort).

Again: they read the requirement that "all or nothing" is reported and found it to contradict the algorithms. They did not take the requirement as being an amendment to the algorithms. And that's how people will read it IMHO.

I too had a hard time understanding what the specification really wanted. I thought that the whole reason of introducing algorithms in W3C specifications was to clearly and unambiguously define the expected behavior, without the need to process all of the prose of the various parts of the document and glue them together.

That being said, I'm not an expert here. If you think that the PR as it stands clarifies the expected behavior, then so be it.

cabanier commented 3 years ago

I too had a hard time understanding what the specification really wanted. I thought that the whole reason of introducing algorithms in W3C specifications was to clearly and unambiguously define the expected behavior, without the need to process all of the prose of the various parts of the document and glue them together.

No, you need to take all the prose into account when implementing a specification. For instance, in this case the decision boils to the prose in WebXR that states "Query the XR device's tracking system". The normative text in WebXR Hands defines that this should fail if any of the joints can't be tracked or emulated.

It's true that going three steps deep isn't always obvious which is why we add informative notes. Examples will also help.

Manishearth commented 3 years ago

For instance, in this case the decision boils to the prose in WebXR that states "Query the XR device's tracking system". The normative text in WebXR Hands defines that this should fail if any of the joints can't be tracked or emulated.

This is correct.