Format of Gamepad identifiers

immersive-web / webxr

Repository for the WebXR Device API Specification.

https://immersive-web.github.io/webxr/

Other

2.97k stars 380 forks source link

Format of Gamepad identifiers #550

Closed toji closed 5 years ago

toji commented 5 years ago

I wanted to separate this discussion out from the rest of the gamepad-as-the-input-solution work, because it's a reasonably complex topic on it's own.

To recap a bit of the prior discussion here, first off there's some guidelines that we've already set that should be considered (taken from #499):

The id MAY be 'unknown' if the type of input source cannot be reliably identified or the UA determines that the input source type must be masked for any reason. Applications should render a generic input device in this case.
Inline sessions MUST only expose ids of 'unknown'.
Otherwise the id must be a lower-case string that describes the physical input source.
It must not include an indication of the handedness of the input source (such as oculus-touch-left), as that can be determined from the handedness attribute.

Beyond that, I had previously had text that said "For most devices this SHOULD be of the format <vendor>-<product-id>. For example: oculus-touch." But this is an unfortunately loose guideline and could result in manual work being done by the UA to expose every device. Ideally we would want a system that surfaces a name provided by the underlying platform with minimal modification.

To that end @thetuvix had previously suggested we use something like the hex representation of USB vendor and product IDs, resulting in strings such as 045e-065d. That's not human readable, but at least it's automated and consistent. My primary concern with it, however, is that not all platforms will surface their input devices in a way that provides access to USB metadata like this (and depending on the OS/device it may not have said metadata at all.)

Some of the native APIs have built-in methods for exposing device identifiers of various forms. For example, in OpenVR you can query a "Prop_RenderModelName_String", "Prop_ModelNumber_String", and "Prop_ManufacturerName_String". Given that we expect to have multiple different devices exposing themselves through such and interface it would make sense to use those. However, that could run into some issues if, for example, the strings they use for an Oculus Touch are different than what's reported when using the Oculus API directly. Also, if the strings reported from the API have the device handedness baked in it would be difficult to scrub that out in a consistent and safe way.

Finally, some APIs may not report a device identifier at all, since there's assumed to only be a single type of controller that they'll be used with. In that case we don't have to report unknown since we would know what the device is, but it'll be up to the UA to make up a value.

I don't have any great suggestions for this aside from suggesting we just do <manufacturer>-<model>, pull those values from the API whenever possible, and if the best way to consume them is USB IDs then so be it. This is effectively the approach that Nell has been taking to the names in the Gamepad Mappings repo thus far, if you want an idea of what that would look like.

Any suggestions to allow this to be more automated/consistent are appreciated!

Artyom17 commented 5 years ago

I agree with @NellWaliczek that the name should be unique and if we are going by the route of human readable string (which I personally like), then besides - there should be, probably, added a model name or something like that. For example, Oculus Touch controllers: those exist in two versions, one for Rift and another for Quest, both named "Oculus Touch". The buttons mapping is the same (with one caveat - there is not touch sensitive thumb rest, but we still can report it for compatibility). So, to identify such controller we'd need either to change the "product-id" to something like 'touch2' or add a , i.e.: oculus-touch-quest and oculus-touch-rift (for example; or oculus-touch-1 and oculus-touch-2). In this case, we still can use the substring 'oculus-touch' to identify the buttons mapping and we also have enough info to determine which GLTF to use to render the controllers.

toji commented 5 years ago

That's a really excellent point about the Oculus Touch branding spanning device, I hadn't considered it before. Thank you! Of the examples you gave, I personally like oculus-touch-quest and oculus-touch-rift if we're going for human readable strings of any form, but 1/2 is workable if needed. I kinda doubt this will be a unique problem for Oculus either, as it's not unreasonable to expect that other platforms will release tweaked updates of controllers of popular devices over time. (Hasn't HTC already done this?)

In that light, maybe the appropriate pattern is something more like <manufacturer>-<model>-<optional version>. Then if you start with a 1st-gen controller with no expectation of a follow up the next variant can append the version as needed. (acme-handifier and acme-handifier-2019?)

darktears commented 5 years ago

Not super useful feedback but I also personally prefer the human readable string.<manufacturer>-<model>-<optional version> seems also nice if we want to support revisions of a given model.

toji commented 5 years ago

Follow up to this based on a conversation we had while investigating Chrome's implementation, and pulling from the discussion above.

It seems like there's two tiers of information we want communicated in this string. The first is button layout. This strikes me as the most important thing we need to communicate, because a fair number experiences will use custom controller models but they'll pretty much all want to have a better understanding of the button data. Furthermore, this is data that can be less specific while still having a large impact, because we already see several instances in the ecosystem where multiple controllers with physical differences still make use of the same button layout. Examples include Windows Mixed Reality controllers (where the Samsung version is subtly different, but the actual controller functionality is identical.) and Oculus Touch (where the Quest and Rift S have a different design but are functionally equivalent to the touch controllers on the original Rift.)

The second tier of information is the actual physical model of the controller, which is primarily to inform the model used for rendering. This data is, by necessity, more detailed (and thus more fingerprint-worthy) but also more narrowly in it's utility. As a result I view it as being of slightly lesser importance than the layout data, although the ideal scenario is of course that we provide both.

Additionally, we've discussed elsewhere (such as on the calls) that when we've established a controller registry (as described in #578) we probably want to include something in the id that indicates that the gamepad data conforms to a registered device mapping.

So, putting all of the above together, I wonder if we want an id layout that looks kind of like this:

<layout>:<model>:<registry-id> (Delimiters are open for bikeshedding, of course.)

The layout is a high-level indicator of what the button mapping is. For example, every Windows MR headset could use an ID of windows-mr or similar, and Oculus Rift, Rift S, and Quest could all use something like oculus-touch. These values could be returned standalone if the UA had no additional data about the model of controller used (or decided they wanted to mask it.) That wouldn't be ideal, but in most cases the fallout won't be too severe (if Samsung headsets show a generic WMR controller instead of the custom one, or the ring on a touch controller points down instead of up that's probably not the end of the world.)

If the model CAN be identified, then the returned string should strive to be as precise as possible about the device, using something like a vendor/product ID when it can. So something like windows-mr:045e-065d or oculus-touch:2019 (Like was discussed above with @Artyom17, just slightly more structured.)

Finally, if the device has an official registry entry the associated ID gets appended to the end like windows-mr:045e-065d:7 or oculus-touch:2019:14. This could logically lead to weird situations where you have an id but not a model, I suppose, and get something like open-vr::12 or :045e-065d:7 but I have a difficult time imagining a scenario where you can identify a controller precisely enough to have a corresponding registry entry but not a layout/model. Might want to just forbid that outright.

thetuvix commented 5 years ago

Thanks for pulling together your thoughts here!

I agree with your analysis that button layout and model are the two primary tiers of bucketing that apps are likely to need to do:

The layout buckets align reasonably with the interaction profile buckets from OpenXR (although without an explicit negotiation system at this layer to ensure forward compatibility).
The model buckets align conceptually with further extensions we expect from vendors like Microsoft to provide per-device controller render model data (with the difference here being that model is an ID for a CDN rather than a direct glTF blob or such).

Regarding registry-id, I'd expect CDNs and apps to spring quickly into action as they see new controllers with just layout:model available, and so that by itself will likely start to become a meaningful key for many purposes in production. At the point afterwards that the registry blesses specific layout and model values, is it perhaps sufficient for the registry to either just pick an existing layout:model key or designate a new layout:model key, without the extra registry-id? That would decrease the chance that apps which did the right thing when getting oculus-touch:2019 might fail some strict key lookup when they later see oculus-touch:2019:14.

Given that we won't have an explicit negotiation API here to absorb the compat risk, it may behoove us to optimize for minimal churn as controllers graduate into the registry and skip registry-id. (at least for the cases where UAs did manage to agree in advance and the registry can simply bless what was agreed on)

ddorwin commented 5 years ago

Whatever mechanism we come up with will be most stressed when new input devices are released. For example, it seems likely that in at least some cases:

(First-party) user agents will be developed in private and released in conjunction with new devices.
- This may require the implementer to choose some value and prevent/limit discussion about registry entries before initial release.
Third-party user agents will not be updated to recognize/support new devices before the device's release.
- Thus, user agents will need to be able to provide some reasonable value to enable users to have a reasonable experience with the new input devices.
The mappings repo will not be updated before release. Even more likely, some applications that use that (or similar) library will not have an updated copy when the device is released.
- As above, some type of reasonable behavior will be important.

In addition, third-party lower-volume input devices may be released, and it would be nice to enable users with such devices even if user agents are slow to or do not add support. (Exposing the VID-PID where possible is one way to allow user space to deal with this, but it raises privacy/fingerprinting concerns. Another option might be to expose a raw value from the runtime.)

Some other things to consider:

The types of raw values exposed for new/unknown/third-party input devices may vary between runtimes. (In the worst case, the user agent may only know the SDK/runtime that provided the input device.)
The same physical controller may be exposed differently by different runtimes.
- For example, we found that one of the Oculus buttons was not independently exposed when using OpenVR.
- Thus, the same model may have different layouts.

ddorwin commented 5 years ago

tl;dr: I propose the following for discussion:

Move input device from identification Gamepad to XRInputSource.
Expose multiple fields using interface(s).
Separate button layout and identifiers for controller rendering
Provide a fallback for each

We should try to make it as easy as possible for developers to do the right thing - or more likely, make it difficult to do the wrong thing. As an example, if you give developers a string that usually works as a direct comparison, they may do just that, which could lead to problems should values change slightly. Alternatively, developers may only check the beginning of a string (i.e., whether the first 12 characters are "oculus-touch"), which would not be future-proof.

Much of the difficulty in constructing the string is the limitation of the single id string member on Gamepad. Perhaps we should instead consider an interface member on the XRInputSource. This would allow for multiple values to be exposed with a clear algorithm for how applications should use them and without concerns about parsing. It would also allow separation of button layout and controller rendering as well as allowing for base or fallback values.

In addition, moving the mechanism to XRInputSource better supports input sources that don't otherwise need a Gamepad object - as well as the possibility that some other mechanism for exposing buttons eventually supplants the gamepad attribute.

As @toji mentioned above, button layout is the most important information. It also tends to change less often that the physical appearance of controllers. Separating button layout from rendering helps ensure that the most critical value is clearly communicated while allowing for detailed identification of the model to use for rendering without concern about breaking applications entirely.

Separating the two also also helps avoid ambiguity. For example, looking at the example in PR #652, what if there was a device not called Oculus Touch that used the oculus-touch layout? 2nd-gen in the model position seems to assume that the layout is part of the model.

In addition, the clear divide would make it easier for users or user agents to choose to not expose the exact controller to minimize fingerprintability while maintaining functionality. For example, if a user has a low-volume controller that is functionally equivalent to the Vive controller, this button layout could be exposed without exposing that the actual controller is the Acme Handifier.

Exposing base/fallback/class values in separate fields would enable better forwards and backwards compatibility among new input devices, user agents, mapping libraries, and applications. User agents could expose details for new controllers without concerns about breaking existing applications. For example, if the Acme Handifier 2019 has an extra button, the user agent can expose acme-handifier-2019 as the layout while also providing the well-known and well-supported acme-handifier as a fallback. Similarly, the model and fallback can be provided, allowing a reasonable representation to be provided by applications that do not have support for the new model. There could also be fallback values for the runtimes, for the case where user agents do not have information other than the runtime (i.e., openvr) that is exposing the input device. (See also below.)

Finally, there could be a field for raw values that could be used by libraries and applications when user agents have not yet been updated to support new devices or otherwise do not recognize an input device. This field might report the VID-PID or values returned by the runtime.

Using separate fields off of XRInputSource also allows for future extensions as necessary. For example, a color field could be added should a runtime add support for real or virtual color customization.

NellWaliczek commented 5 years ago

I really like this suggestion of not overloading the Gamepad.id field, @ddorwin! A few questions:

What do you propose we report in Gamepad.id? Fill it in with ""?
Do you have an idea of how you'd suggest breaking down the separate properties?
In the case of non-gamepad XRInputSources, what do you suggest the fields be populated with?

ddorwin commented 5 years ago

Yes, probably "". Ideally, the Gamepad spec would specify what to do in the unknown case.

There are a couple options depending on whether we want to separate the layout and model identifier structurally (naming is for illustration only):

XRInputSource
...
interface XRInputButtonLayout
DOMString name;
DOMString fallbackName;
interface XRInputModel
DOMString name;
DOMString fallbackName;
DOMString rawValue;

XRInputSource
...
interface XRInputSourceIdentifier
DOMString buttonLayoutName;
DOMString buttonLayoutFallbackName;
DOMString modelName;
DOMString modelFallbackName;
DOMString rawValue;

I think non-gamepad sources would follow similar guidelines (whatever we determine those to be). For example, buttonLayoutName could be "Cardboard", "clicker", or "single-button", and modelName could be "" or "none". As another example, modelName might be "Power Glove" with a modelFallbackName of "hand".

ddorwin commented 5 years ago

Some initial thoughts regarding the registry (#578) / registry ID mentioned above:

The registry may have multiple categories, layers, and/or cross-references. For example, there may be:
- A category of physical hardware devices where each entry defines the corresponding values for buttonLayoutName and modelName - and potentially other information (i.e., related to identification by the user agent and/or possible rawValues).
- Categories for buttonLayoutName and modelName values that contain entries for the values in the category above. Each entry would define the meaning of them for user agents and applications.
Thus, there may not be a need for a separate registry ID field as all fields (except rawValue) will be values from the registry.

Separately, note that these fields should still be useful even if the spec someday exposes a model (i.e., glTF) of the controller. At a minimum, they would serve as a fallback in the case that a user agent is not be able to provide or does not have a model file (i.e., for the reasons mentioned in https://github.com/immersive-web/webxr/issues/550#issuecomment-496418220) or if a user agent chooses not to implement such functionality.

toji commented 5 years ago

Follow up from F2F: The Working Group collectively decided that something like David's proposal was a better way to address this, so I'll close down the previous PR for this issue and spin up a new one. Thanks all!

toji commented 5 years ago

Further follow up: Chatted with David about the proposal a bit more. Primary concern was that it wasn't clear if simply having a single "fallback" value for the button layouts and model would really be sufficient. Seems like having a sequence of possible mappings in order of descending preference would be more flexible. That would allow developers to easily iterate through the array looking for the first string they recognize, which should in turn be the closest approximation to the actual device that their code can actually support. For example:

inputSource.modelName = ["samsung-odyssey-plus", "samsung-odyssey", "windows-mixed-reality"]

This would communicate that the device is ideally rendered as the controller for a Samsung Odyssey+ (assuming that it has any material differences from the base Samsung headset controller, which I don't think it does, but let's not let that get in the way of a good example!) If that model isn't available, though, then the previous gen Samsung controller is fine, and if that's not available then it'll still be close enough if you render it as a generic Windows Mixed Reality controller.

Layout identifiers would work the same way, though I'd naturally expect there to be less variation in those. It would be really useful for developers of hardware on a smaller scale, though, to gain broader software support. "I'm technically a tracker-stick-2000 but if you treat me like a vive-wand then that's generally OK."

Of course, only exposing a single value is always OK, and we should generally set expectations that this array shouldn't be more than 2-3 entries long, because if it's bigger you're likely doing something wrong.

Related, it's not clear if exposing some sort of rawValue would be a step too far in terms of fingerprinting data, and it's not clear how we'd advocate for that to be populated (would it be used to surface OpenXR's localized device name, for example?) so avoiding that for the moment seems prudent. If we do want to add it in the future it's feasible that it could either be added as a separate value or as the first/last item in the modelName array.

So taking all of that into account, I'd suggest that the relevant IDL would looks something like this:

partial interface XRInputSource {
  readonly attribute FrozenArray<DOMString> modelName;
  readonly attribute FrozenArray<DOMString> gamepadLayout;
}

ddorwin commented 5 years ago

Thank you for the nice summary!

With that out of the way, we can bikeshed. :)

The attribute names should probably reflect that they (may) contain multiple values - and maybe how they are to be used (see below).
modelName may not not accurately/unambiguously represent the purpose (rendering a model of the controller vs. being the name of the physical controller model) and looks especially weird as modelNames (see previous bullet).

When picking names and describing the attributes, we may also want to consider what it would mean to add raw value. For example, as we discussed, the order of the arrays may not necessarily be in order of specificity but rather the order in which the application should (the user user agent wants the application to) check. (As an example, if the raw value is added to the end, that's because it should be the last fallback not because it is the least specific.)

For future reference, when discussing whether raw value should be first last, we discussed that first may make sense because it is the most specific but that last might be better because applications shouldn't use that unless absolutely necessary. In addition, applications might become dependent on that value if it is first, which could cause them to break on implementations that choose to not expose such values, such as for privacy reasons.

toji commented 5 years ago

PR #695 is now up to address this issue, for all your bikeshedding needs. ;)