Closed Manishearth closed 4 years ago
Is transform
needed? It seems a UA can calculate it from position
and orientation
void populatePoses(XRframe frame);
We probably should also pass XRHand
here.
Is
transform
needed? It seems a UA can calculate it fromposition
andorientation
Right, we'll calculate it for the user, the same way we do for XRRigidTransform. It can be lazily populated.
We probably should also pass
XRHand
here.
Nah that should be in the constructor
Is
transform
needed? It seems a UA can calculate it fromposition
andorientation
Right, we'll calculate it for the user, the same way we do for XRRigidTransform. It can be lazily populated.
The conversion is so easy; we shouldn't add this to the API.
We probably should also pass
XRHand
here.Nah that should be in the constructor
Yes, you're right :-)
I find it weird that we would return transforms in XRPose but not here, and also would like to avoid people calling new XRRigidTransform(pos, orient).transform
A thing to note: my current API does not reduce the JS-system traffic much. If we wish to do that we'd probably have to introduce a way to get large frozen arrays wholesale from the API.
Low level solution would be providing something like Float32Array to API, which would populate it with interleaved data. But that of course would be something "non-weby", although highest performance and allocation-free:
interface XRAllJoints {
constructor(XRHand hand, XRSpace baseSpace);
void getPositions(XRframe frame, Float32Array data);
void getOrientations(XRframe frame, Float32Array data);
...
}
Regarding transform
, it is interesting to see if people use it, or actually use position and orientation instead. Especially how does this relates to engines with different coordinate systems, do they first modify position/orientation and then construct transform or they take transform and flip it by axes?
In PlayCanvas we have same coordinate system as WebXR API, and used position + orientation in case of Hands and used transform because getPose returned transform only, with position and orientation inside of it. So seems like there are some inconsistencies already.
Could we further simplify this with just a call on XRFrame:
partial interface XRFrame {
bool getjointPoses(XRHand hand, XRSpace baseSpace, Float32Array positions, Float32Array orientations);
}
You'd want radii as well. But yes, we could do that. It feels less webby but we do have a webby API too.
I think frameworks are not going to be using the flat arrays as flat arrays, though.
If you use DOMPoint
each attribute access is a trip to c++ so that's a bit expensive as well.
we do have a webby API too.
yes.
I'm not sure if I understand the motivation behind this returning position/orientation/transform separately instead of XRRigidTransform
s? Then the question of whether or not matricies are included becomes moot. (Also, I'm not a fan of the idea that transform
everywhere else means XRRigidTransform
and here it means a matrix.)
Could we further simplify this with just a call on XRFrame..
Yeah, Rik's snippet is a bit more what I had in mind when I brought this up on the call. I was actually wondering about taking it a step further and allowing this variant to be used for ANY spaces, like so:
partial interface XRFrame {
bool getPoses(sequence<XRSpace> spaces, XRSpace baseSpace, Float32Array positions, Float32Array orientations);
}
At which point I could use this for everything but the viewer pose if I wanted. Coincidentally, since the hand is just an enumerable list of joint spaces it would still work to simply do this:
frame.getPoses(inputSource.hand, referenceSpace, outPositions, outOrientations);
Regarding transform, it is interesting to see if people use it, or actually use position and orientation instead.
I know that at least Three.js uses transform matrices directly.
I'm not sure if I understand the motivation behind this returning position/orientation/transform separately instead of
XRRigidTransform
s
We can't! XRRigidTransforms contain DOMPointReadOnlys, so suddenly we have a way of modifying them, which is ... bad. What would happen if I took one of these transforms and slapped it into getOffsetReferenceSpace()
? 😄
Yeah, Rik's snippet is a bit more what I had in mind when I brought this up on the call. I was actually wondering about taking it a step further and allowing this variant to be used for ANY spaces, like so:
I felt that having a single output object would be very useful here. We could even give it both APIs with a DomPoint getter and a Float32Array attribute for each thing, and lazy initialization would mean these only get updated when you need them.
partial interface XRFrame { bool getPoses(sequence<XRSpace> spaces, XRSpace baseSpace, Float32Array positions, Float32Array orientations); }
I like that one! This way authors can efficiently get a subsection.
We do want to pass XRJointSpaces
though and return radii
.
Yeah, Rik's snippet is a bit more what I had in mind when I brought this up on the call. I was actually wondering about taking it a step further and allowing this variant to be used for ANY spaces, like so:
I felt that having a single output object would be very useful here. We could even give it both APIs with a DomPoint getter and a Float32Array attribute for each thing, and lazy initialization would mean these only get updated when you need them.
If we can avoid it, we should not introduce new objects. It's also an additional burden on developers because now they have to keep track of an additional helper object.
Yeah but it's neater than having to keep track of three different array objects IMO. I don't mind moving the API to the frame, but I still think we should have a single object being populated, ideally one which can expose multiple views over this data.
What do framework authors think? @Maksims @mrdoob @dmarcos
Should we have a single object XRAllJoints
that manages the poses, or a single API call on XRFrame
that returns separate arrays to describe the poses.
cc @raananw too
To be clear, my current proposal is
interface XRAllJoints {
constructor(XRHand hand); // can also be a method on XRHand
// Attributes for directly feeding to the GPU
attribute readonly Float32Array positions;
attribute readonly Float32Array orientations;
attribute readonly Float32Array radii;
// Getters for feeding into, e.g. ThreeJS object metadata
DOMPoint getPosition(unsigned int id);
DOMPoint getOrientation(unsigned int id);
Float32Array getTransform(unsigned int id);
float getRadius(unsigned int id);
}
partial interface XRFrame {
void populatePoses(XRAllJoints joints, XRSpace baseSpace);
}
You would use the API like this:
// at the beginning of the session
let leftJoints = new XRAllJoints(leftInput.hand);
// each frame
frame.populatePoses(leftJoints, localSpace);
for (i = 0; i < number_of_joints; i++) {
leftModel.joints[i].transform = leftJoints[i].transform;
leftModel.joints[i].radius = leftJoints[i].radius;
}
// render leftModel
but you can also directly feed the Float32Arrays to the GPU. This would likely be done along with https://github.com/immersive-web/webxr-hand-input/issues/36 to ensure that the total number of joints is fixed.
We can design these to lazy-load so that by default if you are not using getPosition()
those objects will not be updated.
No strong opinions on one function per joint vs. retrieve all joints at once. Whatever protects the most against allocations / GC pauses. Maybe I just expect consistency with other parts of the spec. A-Frame uses transform.matrix for both joints and head pose. Good work everybody.
frame.getPoses
- probably wont work, as it conflicts with Pose and JointPose.
Although unified way of getting poses data into pre-allocated, reusable Float32Arrays - is a definitely good thing.
I believe accessing input source related data should be by input source, not for multiple of them in one go. From application logic, this is because input sources can dynamically be added and removed. Also tracking information can be lost on them. So case where tracking is lost on one hand but not the other - is a common case.
What about this:
boolean XRHand.getPoses(XRSpace baseSpace, Float32Array transforms, Float32Array positions, Float32Array orientations, Float32Array radii);
Where any of Float32Array arguments can be null/undefined. This provides ability to query only information needed. Also should it return true/false when it did populated information or not, based on tracking state?
@Maksims This is not for multiple input sources at one go, this is for one XRHand at a time, and each XRHand is tied to an input source. This is true in both my proposed API (where XRAllJoints
is constructed from an XRHand
) and @cabanier's (where getJointPoses()
accepts an XRHand
)
Also should it return true/false when it did populated information or not, based on tracking state?
Yeah this is also why I prefer the XRAllJoints approach, because each getter can return null if it wishes.
It's worth figuring out (separately, in https://github.com/immersive-web/webxr-hand-input/issues/36) if we want to make it so that either the entire hand is reported or none of it is. In that case the API you and @cabanier proposed (in https://github.com/immersive-web/webxr-hand-input/issues/37#issuecomment-665720921) would work, but I still prefer having a separate object.
Also should it return true/false when it did populated information or not, based on tracking state?
Yeah this is also why I prefer the XRAllJoints approach, because each getter can return null if it wishes.
XRAllJoints
feels a bit heavyweight. I'd much prefer if we could get by with a couple of function calls.
Thinking about this some more, we might be able to simplify the spec quite a bit.
For instance, the radius
is not relative to another XRSpace
so there's no real reason to require a baseSpace.
This means that XRJointPose
and thus getJointPose
are not really needed and could be dropped and we could get by with the regular getPose
.
The interface could look like this:
partial interface XRFrame {
float? getJointRadius(XRJointSpace joint); // replaces getJointPose
// for bulk operations
bool getPoses(sequence<XRSpace> spaces, XRSpace baseSpace, Float32Array positions, Float32Array orientations);
bool getJointRadii(sequence<XRJointSpace> spaces, Float32Array radii);
};
getPoses
could move to the WebXR spec since it's not specific to hands.
We might even be able to get rid of XRJointSpace
. We might also consider using a sequence of DOMPointInit
instead of Float32Array
but then each access will cross the c++ boundary.
You would use the API like this:
// each frame
if (frame.getPoses(leftJoints, localSpace, leftModel.joints.positions, leftModel.joints.orientations)
&& frame.getJointRadii(leftJoints, leftModel.joints.radii) {
// render hand;
} else {
// remove hand
}
bool getPoses(sequence<XRSpace> spaces, XRSpace baseSpace, Float32Array positions, Float32Array orientations);
How does this work with tracking loss, though? Another array of bools?
bool getPoses(sequence<XRSpace> spaces, XRSpace baseSpace, Float32Array positions, Float32Array orientations);
How does this work with tracking loss, though? Another array of bools?
At least for Oculus, poses have an all or nothing success rate so a simple bool
return suffices. For UA's that can do partial hand results, they could put NaN's as return values or we could define that they should return emulated values.
At least for Oculus, poses have an all or nothing success rate so a simple bool return suffices. For UA's that can do partial hand results, they could put NaN's as return values.
Well, the array is not just XRJointSpaces 😄 . This is kinda why I want to focus on joints and do this "for the entire hand", because OpenXR is also all-or-nothing here.
Well, the array is not just XRJointSpaces 😄 . This is kinda why I want to focus on joints and do this "for the entire hand", because OpenXR is also all-or-nothing here.
Ok, then we could define it as "if there is tracking loss, try to return an emulated value. If an emulated value can't be returned, return NaN"
Or alternatively, combine the two calls into getJointPoses
and only make it work for hands.
Yeah, but this is kinda why I prefer the AllJoints model, because we can do both: consumers who wish to pass data to the GPU can use the arrays, consumers who want DOMPoints can check the DOMPoints. There's some JS-native traffic there but the main goal is fixing allocations IMO.
I'm not against getPoses()
, though. I just find it potentially unnecessary. An advantage of that model is that you can pick and choose what you want: if you want positions, you can ask for positions, if you want transforms you can ask for transforms (and null out the others).
Perhaps we should have something like this instead?
dictionary FillPoses {
Float32Array? positions;
Float32Array? orientations;
Float32Array? transforms;
}
interface XRFrame {
void fillPoses(sequence<XRSpace> spaces, XRSpace base, FillPoses poses)
}
The advantage here is that it can be called as fillPoses(spaces, base, {transforms: transforms})
or fillPoses(spaces, base, {positions: positions, orientations: orientations})
per preferences and you don't have to pass in nulls,
Idk, I'm not a fan of passing a million parameters to a function 😄
Perhaps we should have something like this instead?
dictionary FillPoses { Float32Array? positions; Float32Array? orientations; Float32Array? transforms; } interface XRFrame { void fillPoses(sequence<XRSpace> spaces, XRSpace base, FillPoses poses) }
That's a neat idea. Do you know of any APIs on the web platform that use this idiom? (= pass optional parameters in a dictionary that are filled in by the call)
Idk, I'm not a fan of passing a million parameters to a function 😄
I agree it's not as pretty but it does save on typing.
Not aware of any APIs that do this.
I'm not sure it saves on typing, it's quite likely that the frameworks will be stashing positions
orientations
and transforms
somewhere anyway
Not aware of any APIs that do this.
Well... then we probably shouldn't do it :-(
I'm not sure it saves on typing, it's quite likely that the frameworks will be stashing
positions
orientations
andtransforms
somewhere anyway
fillPoses(spaces, base, positions, orientations);
is shorter than
fillPoses(spaces, base, {positions: positions, orientations: orientations});
I believe you can just write {positions, orientations}
but more specifically I'd expect people to have these around in an object anyway
Either way, unsure if this is good
fillPoses(spaces, base, positions, orientations);
is shorter than
fillPoses(spaces, base, {positions: positions, orientations: orientations});
First option does not allow extra arguments, like "radius" or "transforms", while second option gives you ability to extend object with keys, depending on available data.
This is actually very generic, gives you ability to query transforms for gripSpace:
let data = { transforms: transformsData };
frame.fillPoses([ inputSource.gripSpace ], referenceSpace, data);
As well as joints data:
let data = { orientations: orientationData, positions: positionsData, radii: radiiData };
frame.fillPoses(handJoints, referenceSpace, data);
This suits engines which use transform or position+orientation, as well as any extra data, like radius of joints. No allocations, no null arguments, pretty clean and self explanatory.
The only thing that documentation has to clarify, is how many floats are used per item, like transformations are mat4x4, so 16 floats, positions are 3 floats, orientations are 4 floats, radius is 1 float. So developer has to create Float32Array with no less data than that.
Looking at the current API (in other modules and WebXR itself), frame.getHandPose
(or Poses
) feels much more consistent. Same goes to sticking with XRRigidTransforms (which is used throughout the API) and not positions/rotation/matrices.
The simplest approach i could think of was this -
interface XRFrame {
FrozenArray<XRJointPose> getJointPoses(XRHand hand, XRSpace referenceSpace);
// while keeping the current API for specific joint queries
XRJointPose getJointPose(XRJointSpace joint, XRSpace relativeTo);
}
This will return poses for a single hand (and not both at the same time), but will not require calling the getJointPose API 20 times.
A different approach - similar to getViewerPose
, we could also get an already populated XRHand Set with the relevant information. something like this:
interface XRFrame {
FrozenArray<XRHand> getHandPoses(XRSpace referenceSpace);
}
which will require two changes - a different reference in the XRInputSource (xrInput.hand
is no longer XRHand, but an index or some form of a placeholder that can be associated with the XRHand
object), and that the XRHand object will not contain XRSpace
but XRPose
objects.
This will return poses for a single hand (and not both at the same time), but will not require calling the getJointPose API 20 times.
@RaananW Unfortunately this does not satisfy the requirements, the primary issue at hand is not that you have to call getJointPose()
20 times, it is that each frame the user agent must allocate 25 x 2 XRJointPoses (which is also 50 XRRigidTransforms and 100 DOMPointReadOnly). It's not possible to reuse the XRRigidTransforms since they're assumed to be immutable, and furthermore we expect XRPose to be something the author may wish to hold on to at times.
Got it. But if we don't use XRRigitTransformation (or introduce XRMutableRigidTransformation), this can still be a single reusable object. Both suggestions actually.
@RaananW Yeah that's what my original API proposes
/agenda to discuss this
So I discussed this more with @toji and a couple things occurred to me:
Firstly, if we implement https://github.com/immersive-web/webxr-hand-input/issues/36 and require devices report "all or none", we can turn XRHand
into an iterable and can pass it in whenever sequence<XRSpace>
is required.
Secondly, overloading is possible! We can have both these APIs at once:
interface XRFrame {
void getAllPoses(sequence<XRSpace> spaces, XRSpace base, Float32Array positions, Float32Array orientations);
void getAllPoses(sequence<XRSpace> spaces, XRSpace base, Float32Array transforms);
}
Note, however, while we can have APIs that overload sequence<>
the WebIDL overload resolution algorithm does not seem to be able to distinguish between sequence<A>
and sequence<B>
, and furthermore will not be able to distinguish between those and Float32Array
; so we can only have one method per number of arguments.
We can also have a getAllJointRadii(XRHand, Float32Array radii)
API that gets all joint radii at once.
I would kinda prefer to have an API that works on DOMPoints but I understand the pushback.
And how lost tracking will be handled? Fills Float32Arrays with NaN's? Or returns boolean (modified==true) and keeps arrays untouched?
And how lost tracking will be handled? Fills Float32Arrays with NaN's? Or returns boolean (modified==true) and keeps arrays untouched?
The functions would have to return a bool
. I think @Manishearth forgot to add that since he said:
require devices report "all or none"
I would kinda prefer to have an API that works on DOMPoints but I understand the pushback.
Since these are low level APIs, I think it's ok for them to be less developer friendly in exchange for higher performance.
"all or none" works if this is a hand-specific API, but not if this is an API requesting multiple spaces in general IMO
But we can set NaN
TAG discussion at https://github.com/w3ctag/design-principles/issues/231
I take it as a good sign that we're one of several groups asking about this, rather than the weird outlier. Interested to see what comes of it.
@Manishearth Can you prepare a PR for this? If we wait too long, we might forget some details.
I think the issue is that there are two approaches. I can prepare a PR for both but I'm a bit busy today.
I believe your proposal in https://github.com/immersive-web/webxr-hand-input/issues/37#issuecomment-667308534 is the least controversial one.
I'll likely need to join the call late today, though I'm very interested in this topic! This exact bulk joint update API discussion was the last thing to converge in the OpenXR hand tracking discussions.
I am generally in favor of pushing on the "fill-array" style patterns here that TAG is warming up to, as we will need them even more strongly when we are returning mesh vertex/index buffers to apps. I'm most in favor of @Manishearth's original XrAllJoints
proposal over the generic getPoses
approach, as I expect us to have related data to serve up here over time. I could warm to the getPoses
approach, so long as we find a solution to reasonably handle emulatedPosition
and tracking loss, though I expect that will ultimately feel messier than bool attributes on an XrAllJoints
-style type.
@thetuvix we're discussing this now
Sorry I couldn't make it to the call on Tuesday!
My primary concern with seemingly simpler approaches like https://github.com/immersive-web/webxr-hand-input/issues/37#issuecomment-667308534 is that I suspect the getPoses
overload approach will become more complex than the XrAllJoints
proposal once all the data values we need to communicate are represented. (e.g. radius
, isDetected
, isEmulated
)
Here are the per-hand and per-joint values that the hand module will need to communicate each frame in some manner:
XRInputSource
)
isDetected
(if false, any per-joint array values should be ignored)position
/orientation
(and/or transform
)radius
isEmulated
(if true, this joint position/orientation is still usable, but the joint pose was occluded or otherwise inferred this frame relative to other nominally-tracked joints)linearVelocity
/angularVelocity
(when we otherwise add XRSpace
-relative velocity to the WebXR API)The XRAllJoints
approach gives us a natural place to hang both per-hand scalars and various per-joint arrays, including any additional arrays in the future for things like linearVelocity
/angularVelocity
.
Note that as part of #36, we're making the key simplification for developers that if they get any joints back (i.e. isDetected
is true), they will get back a usable pose and radius for all joints in the joint set they requested, even if some joints have their isEmulated
bool set to true this frame.
We did decide to land https://github.com/immersive-web/webxr-hand-input/issues/36 so the isDetected isn't needed anymore. isEmulated might be.
During discussion of https://github.com/immersive-web/webxr-hand-input/pull/32 in today's call it seemed like while everyone wanted to solve the problem of ~200 objects being created per frame, folks didn't want to cache XRJointPoses because it would be unexpected.
OpenXR's API for this returns a single object containing all joint pose info. We could do something similar.
We wish to keep the joint spaces because it's useful to be able to relate them with each other via
getPose()
, but it would be nice if there were some "all hand joints" object that could be accessed for efficiency.I'm imagining an API like:
cc @cabanier @thetuvix @toji