immersive-web / webxr-hand-input

A feature repo for working on hand input support in WebXR. Feature lead: Manish Goregaokar
https://immersive-web.github.io/webxr-hand-input/
Other
105 stars 17 forks source link

Provide a signal that hands are doing a system gesture #117

Open cabanier opened 2 years ago

cabanier commented 2 years ago

We should provide a signal when hands are doing a system gesture. When the user is doing this, the experience should no longer try to detect gesture or draw target rays.

Currently these interfere on the Quest browser so a pinch gesture is detected when the user tries to make the gesture for the menu or oculus buttons.

Probably a read-only boolean on XRHand would be enough. Maybe call it inSystemGesture?

/agenda

eXponenta commented 2 years ago

Events like a 'gesturestart' and 'gestureend' will be more useful, because a system gestures is timelong. Events has more coherence with webxr API, because we already have events for each state mutation (visibilitychange, selectstart/selectend and other)

cabanier commented 2 years ago

WebXR input sources are based on polling so it make more sense to have it as an attribute (just like gamepads)

eXponenta commented 2 years ago

But system gesture is global state. Really we not always needs to use hand tracking API for knowing when system gesture was executed. This is same as visibilitychange event when you down a home button. You not can track that home button was pressed, but there are events corresponding this.

Manishearth commented 2 years ago

So my perspective isthat in WebXR (outside of the Gamepad API that existed before us) doesn't really do the mutable field thing, we do:

I'm not sure if we should start now. That said, the first option isn't really viable here since hands come with a large pile of frames, not just one, so there's not a single logical frame to put this on. I tend to lean towards events but @cabanier does feel that a mutable field on the hand or input would be more logical

@toji, what do you think?

eXponenta commented 2 years ago

I looking to hybrid API. This is because we already have API as 'inputsourceschange' and 'inputSources' field in session state.

You can using both - react when list of sources is changed as in update loop, or listen event.

Should be way to listen gesture execution and cancelling and read current hand state. Because OS can use multy-hands gestures, like clapping hands 👏

toji commented 2 years ago

My inclination would be to lean towards events here. I recognize that in many ways we do have a polling API, but as was pointed out earlier we already have events for several similar state changes and this feels like a natural extension of those. Especially because the gesture may use multiple hands on some platforms, as @eXponenta pointed out, so attaching something explicitly to an XRHand may be too limiting in scope.

A few additional thoughts:

@eXponenta mentioned allowing cancelling gesture execution, and I think that it's a good capability to keep in mind for other gestures but not system ones. We definitely don't want pages to be able to suppress certain critical system gestures, such as exiting immersive mode or returning to a home screen. For some other future gestures (navigation? Copy/Paste? Not really sure) it may be desirable, so at least keeping the door open to cancelable gestures doesn't seem like a bad idea.

Also, as was hinted at earlier in the thread, it may not be necessary or desirable to specifically signal that a system gesture is in progress. The more generic we can make this signal the better. I'm slightly hesitant to broadcast "The user is making a system gesture which may mean they're going to leave your experience!" because that can lead to anti-patterns around showing users "Wait! Don't Go!" messages even when it's not appropriate. I'd prefer a signal that indicated more generically "You shouldn't be allowing hand inputs to interact with your scene right now."

It's tempting to utilize the visible-blurred state for this, or simply drop hand tracking for a bit in these states to make it impossible for pages to do the "wrong" thing, but that also doesn't sound like a good user experience since at best it means abruptly switching to system-rendered hands that won't blend or depth test with the rest of the scene properly. I'm not sure what shape I'd prefer, though.

cabanier commented 2 years ago

My inclination would be to lean towards events here. I recognize that in many ways we do have a polling API, but as was pointed out earlier we already have events for several similar state changes and this feels like a natural extension of those. Especially because the gesture may use multiple hands on some platforms, as @eXponenta pointed out, so attaching something explicitly to an XRHand may be too limiting in scope.

If both hands participate in a system gesture (like Hololens?), both hands would have the attribute set. I don't really like using an event because of the asynchronous nature of them and because it will be very noisy.

As I mentioned to @Manishearth , on the Quest browser we would trigger this each time the user's palm faces the headset. Also, what happens if the hand goes out of view while in a system gesture and then comes back in the other state?

Events are just too error prone. For instance, even though three.js is the most popular framework, it still doesn't properly handle when hands are intermittent.

Taking OpenXR as reference, it also doesn't create an event and instead sets a flag (XR_HAND_TRACKING_AIM_SYSTEM_GESTURE_BIT_FB) if the system thinks there's going to be a system gesture.

@eXponenta mentioned allowing cancelling gesture execution, and I think that it's a good capability to keep in mind for other gestures but not system ones. We definitely don't want pages to be able to suppress certain critical system gestures, such as exiting immersive mode or returning to a home screen. For some other future gestures (navigation? Copy/Paste? Not really sure) it may be desirable, so at least keeping the door open to cancelable gestures doesn't seem like a bad idea.

I agree. We should not allow the session to override system gestures.

Also, as was hinted at earlier in the thread, it may not be necessary or desirable to specifically signal that a system gesture is in progress. The more generic we can make this signal the better. I'm slightly hesitant to broadcast "The user is making a system gesture which may mean they're going to leave your experience!" because that can lead to anti-patterns around showing users "Wait! Don't Go!" messages even when it's not appropriate. I'd prefer a signal that indicated more generically "You shouldn't be allowing hand inputs to interact with your scene right now."

We talked about this a bit. What we really want to convey is that it's ok for the experience to assume that they are in control of the hand gestures and can assume that there's no conflict.

It's tempting to utilize the visible-blurred state for this, or simply drop hand tracking for a bit in these states to make it impossible for pages to do the "wrong" thing, but that also doesn't sound like a good user experience since at best it means abruptly switching to system-rendered hands that won't blend or depth test with the rest of the scene properly. I'm not sure what shape I'd prefer, though.

Yes, visible-blurred should only be used when system UI is up.

Manishearth commented 2 years ago

Taking OpenXR as reference, it also doesn't create an event and instead sets a flag (XR_HAND_TRACKING_AIM_SYSTEM_GESTURE_BIT_FB) if the system thinks there's going to be a system gesture.

Worth noting, that flag is not set on some mutable object (OpenXR seems to avoid those too), it's set on a per-frame object.

Unless we can find a per-frame object to tack this on to, this indicates to me that we should probably use an event.

cabanier commented 2 years ago

Worth noting, that flag is not set on some mutable object (OpenXR seems to avoid those too), it's set on a per-frame object.

Unless we can find a per-frame object to tack this on to, this indicates to me that we should probably use an event.

OpenXR does have events and they chose not to generate one for this feature.

Manishearth commented 2 years ago

Yes, but they also did not make a mutable field :smile:

OpenXR has events and frames at its disposal, and it chose frames. We do not have frames at our disposal, or at least we don't have a convenient frame to hang this off of, so I'd err towards the side of using the pattern we already have (events) rather than the one we don't (mutable fields).

cabanier commented 2 years ago

As I mentioned before, events with controllers have been problematic for developers to a point where we are now disabling events for hands because too much content will break.

Manishearth commented 2 years ago

Okay, but magical mutable fields are also error prone since it's never clear what frames this is happening on. It's really counter to our model here: we don't do this outside of gamepads (and that's not our choice), and OpenXR doesn't do this period.

If we want to avoid an event we should perhaps add a getHandState(xrhand) getter on XRFrame (probably with a better name), I'd be fine with that.

cabanier commented 2 years ago

WebXR Layers has lots of mutable fields :-) We can define in prose how and when that field is updated which should solve the question about what frame this boolean applies to. Note that with an event, you don't even know what frame is in this state

cabanier commented 2 years ago

Okay, but magical mutable fields are also error prone since it's never clear what frames this is happening on.

Is it necessary that you know what frame it is happening on?

It's really counter to our model here: we don't do this outside of gamepads (and that's not our choice), and OpenXR doesn't do this period.

OpenXR is a C API so it doesn't have the concept of live objects :-) My point was that they could have signaled these state changes with events but they chose not to.

If we want to avoid an event we should perhaps add a getHandState(xrhand) getter on XRFrame (probably with a better name), I'd be fine with that.

We discussed this at length during today's meeting and the consensus seemed to trend to a function on XRInputSource. (Personally, I'd still prefer an attribute) @toji made a good point that this feels a lot like a "blurred" event in that you still get poses but the author should only use them for drawing.