ActionManager API for discussion

TrevorFSmith commented 6 years ago

This is a bit of pseudocode demonstrating the rough idea of how I envision an app writer using the action-input lib.

The basic idea is that the app writer declares (perhaps extends) how to map low level inputs to high level actions (perhaps via filters) in JSON files (in this code they're *.map files) and then uses the event or polling API to track events.

The app writer is also responsible for the logic of which maps are active at any time, so they'll probably have maps for different XR modes (flat, portal, and immersive) and for different situations like when the game is running, paused, or in a configuration menu.

I don't expect to merge this PR, just putting it here for discussion.

johnshaughnessy commented 6 years ago

The parts of this input effort that make the most sense to me and that I think we can reach consensus on are about how "low-level" device inputs will be accessed thru the high level actions. How that will happen may be semantic paths or something else, and they'll likely be linked together through some kind of .map file containing a binding definition. The part of the effort that is the least clear to me is how a .map file will be chosen at runtime, what role the app developer plays and what role the end user plays, whether the browser will keep state about preferences on behalf of the user, or whether the runtime will end up doing remapping of its own (with the browser sitting on top of OpenXR, for example). With that in mind, I don't know how to think about flat / portal / immersive modes built into the library. In the near term I imagine we will all want the same capabilities with respect to submitting a binding definition and then swapping active action sets within that binding definition at runtime, so I suggest we build up from low-level to high-level actions (with filters) until we get there, and figure out what a reasonable story for choosing the right binding definition depending on the user's device by writing applications that make use of that capability.

johnshaughnessy commented 6 years ago

It's worth comparing this to the proposed usage in 1-game.js: https://gist.github.com/netpro2k/7f91598d32476f97d24285c6d77f17d6

Some differences:

This PR breaks binding definitions into parts based on the action set (playing, paused, menu). (More accurately - the vocabulary used here is action map == binding definition == action set vs on the linked proposal action map == binding definition but action set is a named subset of a given map.)
This PR separates binding definitions based on flat, immersive and portal modalities, but not based on the user's specific input device.
This PR includes custom filters at the application layer, rather than restricting them to the input library.
The linked proposal suggests how an application may trigger output events (e.g. haptic pulse)

Some similarities:

Both proposals allow for either evented or polled API usage.
In application code, actions are referred to by semantic paths declared in the binding definition.
The application is responsible for applying binding definitions (rather than having the runtime or browser do it).

TrevorFSmith commented 6 years ago

Responses to @johnshaughnessy's excellent lib comparison inline:

This PR breaks binding definitions into parts based on the action set (playing, paused, menu). (More accurately - the vocabulary used here is action map == binding definition == action set vs on the linked proposal action map == binding definition but action set is a named subset of a given map.)

Ok, I'll use "action set" to mean a set of input->action mappings that are activated together. Web apps will need to toggle action sets based on display mode (flat, portal, immersive) and app-specific situation (playing, paused, config menu, etc). So, if someone is building an app that works across all three display modes and has N app situations, that means there are 3 * N possible action sets. I can easily think of web apps with 10 or more app situations, so this quickly becomes a lot to manage.

I'm not tied to separating maps into multiple files. If we want to encourage web apps to use the default mappings then we will need a way for the web author to load up the default maps and then prune or extend them with app-specific maps.

This PR separates binding definitions based on flat, immersive and portal modalities, but not based on the user's specific input device.

Yes. An important goal is to enable the web app author to write code that is dependent on high level actions instead of low level inputs. I realize that this is a tall order, and may be occasionally awkward, but this is the only way for web apps written today to work on hardware that ships tomorrow.

This PR includes custom filters at the application layer, rather than restricting them to the input library.

Yes. See comment in the code about generic vs. app-specific filters.

The linked proposal suggests how an application may trigger output events (e.g. haptic pulse)

Yes, this PR doesn't address that. The linked proposal has Action.haptic_pulse(...) but that assumes that the action comes from a device that has haptic feedback.

One of the things we have previously discussed is including in high level action events some indication of the original source of the hardware inputs that led to the action event. I'm not sure how to do that for action events emitted by filters (maybe the action event indicates a list of hw that fed into the filter).

I wonder if we could have Action call the hardware input source to trigger haptic feedback. Something like:

Action.sendHapticPulse = (...params) => {
    for(let source of this.sources){
        if(source.hasHaptics){
            source.sendHapticPulse(...params)
            return
        }
    }
}

TrevorFSmith commented 6 years ago

I agree with @johnshaughnessy that there are a lot of unknowns about how end users would edit and share configurations. I suggest that we don't try to solve that problem for V1 of this lib. Instead, we could focus on solving the general problems of how lib authors and web app writers can define and combine input->action mappings (aka binding definitions) and how best to toggle action sets at runtime based on display mode and app situation.

TrevorFSmith commented 6 years ago

Ok, this is ready for another round of review. I incorporated the feedback that I heard.

If it feels good enough to you all, I'll implement one full path from a hardware input source, up through a loaded action set of bindings, and finally out through an ActionManager's event and polling API.

cvan commented 6 years ago

thanks for addressing all the feedback. this looks quite spiffy!

a test would be good to complete the POC, but I would not say it blocks, as this is all incrementally improving work.

TrevorFSmith commented 6 years ago

Still working on an initial proof of concept. So far, it feels pretty nice but I'll need to come back with a set of questions about how to handle bindings and semantic path tracking.

TrevorFSmith commented 6 years ago

Ok, I've implemented the basic path from keyboard input (via KeyboardInputSource) up through ActionMap (which is a binding map loaded via JSON) up through ActionSet, which is used by the ActionManager.

In /example/simplest/ I've used the ActionManager in the simplest possible way, just loading the default input sources, filters, and bindings. You can run npm install and then npm run start and then point your browser at http://127.0.0.1:8080/example/simplest/ Open the JS console, put focus back on the web page, and press the wsad keys. Look in /src/default/playing-flat-action-map.json for the JSON that sets up the bindings.

asajeffrey commented 6 years ago

Complete XR newbie here, so my questions are a bit random!

At first glance, action-input looks like an embedded domain-specific language for dataflow graphs, where the inputs are devices (with low-level device-specific events) and the output goes to the application (with high-level application events).

At second glance though, there may be devices that are event sinks as well as event sources (e.g. haptic controllers). Ditto the application is an event source as well as an event sink?

Where do UI controls fit in this model? Are they part of the application, or are they their own event sink/sources? I'm wondering if there are devices that come with their own controls, e.g. a controller that has its own config dialog?

Is there a notion equivalent to keyboard focus? Is switching focus part of action-input or handled by another layer?

I'm sure I will have other annoying questions, that will do for the moment :)

TrevorFSmith commented 6 years ago

@asajeffrey Welcome!

At first glance, action-input looks like an embedded domain-specific language for dataflow graphs, where the inputs are devices (with low-level device-specific events) and the output goes to the application (with high-level application events).

Yes, the ActionManager holds the InputSource and Filter instances, then uses the ActionMap to load up the DSL (in JSON form) that binds inputs to filters (that emit actions) or directly to actions. The ActionMaps are managed by the ActionSets so that the app dev can switch sets of bindings in and out as the app changes situations. So, from that perspective it is a straight-forward DSL for mapping.

But, as you wrote, there are complexities that break that frame a bit.

Apps need to give haptic feedback, so when an action is activated the app needs to be able to vibrate the hardware device that led to the action. So, the actions need to be linked somehow to the originating input device (like a Vive wand or a gamepad) if possible.

Also, VR app devs need to be able to change the display of the 3D models showing the input hardware, for example when I push a thumbstick on a Vive controller forward, the 3D model of the thumbstick should also move forward. So, there needs to be some way to poll for low level hardware state in the render loop and link it to a sub-part of a 3D model of input hardware. It's still very much an open question of how to make that possible without the app code knowing about specific hardware, which then causes it to break when new hardware comes out.

Where do UI controls fit in this model? Are they part of the application, or are they their own event sink/sources? I'm wondering if there are devices that come with their own controls, e.g. a controller that has its own config dialog?

to date, the general idea is for soft controls (as opposed to hardware controls like switches) to be defined at the app level, either by the app author or as part of a separate library. It is interesting to consider what it would mean to bind these soft controls to actions, though! It's worth thinking about, certainly.

Is there a notion equivalent to keyboard focus? Is switching focus part of action-input or handled by another layer?

I think the general idea is to switch out ActionSets based on application situations, so for example if the app code decides to focus input on a text input box then it would activate an ActionSet that emitted text editing actions like "add-text" or "backspace".

I'm sure I will have other annoying questions, that will do for the moment :)

These are in no way annoying! Keep them coming and weigh in if you have ideas of how to improve the lib.

asajeffrey commented 6 years ago

OK, next round of questions...

Can the implementor of a custom filter do so by providing a dataflow graph? E.g. if there's already filters foo and bar, can a custom filter just return the graph "feed foo into bar"? Can they return the graph "if ... { foo } else { bar }"? Can they dynamically switch between foo and bar based on other events?

In the thumbstick example, is the 3D model for the thumbstick part of the application, or part of the device configuration? (I can see an argument both ways round, e.g. my skiing simulator wants to skin the hand model with a skiing glove; my octopus controller wants to present a tentacle rather than a hand, etc.)

My question about keyboard focus had a hidden thought about WebVR, where a scene might be composited from multiple domains, and a user might really care about which domain their input is going to. (e.g. a password or similar authenticator). This gets back to my question the other day about security UX.

TrevorFSmith commented 6 years ago

Can the implementor of a custom filter do so by providing a dataflow graph? E.g. if there's already filters foo and bar, can a custom filter just return the graph "feed foo into bar"? Can they return the graph "if ... { foo } else { bar }"? Can they dynamically switch between foo and bar based on other events?

In the current code, an input can be mapped to a filter and the filter can emit a single action at a time. So, it's pretty limited in that you can't chain filters or do any kind of logic. One way that one could do that is to implement a custom class that extends Filter and holds within itself dataflow graph for logic and filtering.

I just checked in an example that creates and uses its own custom Filter class. It's used in /example/all-custom/index.js if you want to take a look.

In the thumbstick example, is the 3D model for the thumbstick part of the application, or part of the device configuration?

The goal is to provide hardware models and metadata as part of the library and allow app devs to add their own at runtime, but that isn't implemented, yet.

My question about keyboard focus had a hidden thought about WebVR, where a scene might be composited from multiple domains, and a user might really care about which domain their input is going to. (e.g. a password or similar authenticator). This gets back to my question the other day about security UX.

Yes, multiple domain experiences are a whole bag of problems that we're completely ignoring for the moment. There are many, many open questions about how we'd route input, how we'd share the graphics hardware, how we'd be able to let virtual apps from different domains work together to give the user a nice experience. It's like implementing an entirely new OS-level of application/process/security manager so it has been too big for anyone to bite off.

asajeffrey commented 6 years ago

Hmm, interesting, the idea is that there's explicit use of notification in the custom filters. Is there an aim to give a map/filter/fold API?

Also interesting is that the time model is the JS time model (e.g. setTimeout). Is there an aim to support time-based APIs? The poster child for such APIs is anything that uses derivation/integration, e.g. if there's a controller that measures acceleration being fed into an application that expects velocity.

The time model becomes complex once you put anything with significant latency into the picture, e.g. in the social app, the different clients will have different clocks that aren't tightly in sync.

The security UX is hard, we might want to keep it at the back of our minds though.

asajeffrey commented 6 years ago

Might want to mention somewhere that running the simplest example in FF needs dom.moduleScripts.enabled set to true.

asajeffrey commented 6 years ago

I was playing around with the example, just to see what the API is like. This is what I came up with... https://github.com/asajeffrey/action-input/tree/pr14/example/turtle

peek 2018-03-23 12-08

Thoughts? (Other than that frictionless turtles are hard to drive :)

TrevorFSmith commented 6 years ago

Go, turtle, go!

Is there an aim to give a map/filter/fold API?

Yes, I think there's room to think about what a Filter API could be to handle more complex chaining. Right now the filter function is just called when an input is mapped to it by the ActionMap and then the return value of the filter function can trigger zero or one action notification. This was the simplest thing I could implement to get filters in place, and definitely not the final API.

Also interesting is that the time model is the JS time model (e.g. setTimeout). Is there an aim to support time-based APIs?

I think I'm only using setTimeout in the RandomInputSource, just so that it creates an input stream to play with. It's an open question of how to expose the origin time of input events and then actions. We're definitely interested in high precision timing for things like when a person pulls the trigger button on a wand while pointing it at something in a scene graph.

asajeffrey commented 6 years ago

Yeah, I think time is going to be one of the trickier things to deal with. Even the turtle needed position, velocity and acceleration, which is two levels of differentiation wrt time. Dealing with hardware that has its own time model, and a physics engine, eeks!

TrevorFSmith commented 6 years ago

ActionManager.queryInputPath now provides the value of specific inputs, queried by semantic path.

TrevorFSmith commented 6 years ago

Ok, Now on ActionManager.poll the ActionMaps will iterate through their bindings, query for input values, and update action state based on what they find. So, there are no listeners except for the ones provided by the ActionManager event API.

TrevorFSmith commented 6 years ago

I think the next step is for me to hook this into the TodoMVC application and see how it feels in an application with more complexity than the two dinky examples in this PR.

TrevorFSmith commented 6 years ago

Ok, @fernandojsg. I've addressed the concerns you raised during review. When you have a moment, please take another pass and let me know if the changes are acceptable.

fernandojsg commented 6 years ago

@TrevorFSmith I've been checking the changes you made and they looks fine! But there're still some other comments that are not being addressed I believe because they appears as (N hidden conversations) and you may missed them.

TrevorFSmith commented 6 years ago

@fernandojsg Ok, I've made the changes except for the changes to return values to avoid array allocations. I'll tinker a bit more with that this afternoon, hopefully committing by the end of today.

TrevorFSmith commented 6 years ago

Hey, @fernandojsg. I've looked at each of the methods that currently return an array and in each case I'm not sure what to do that would improve the GC load. In the case of Filter.filter(...) the result will be different for each extending class, with each deciding the current action value (which could be a boolean, a float, or some other type). The actionParameters return value is an object, but it often shouldn't be modified by the filter because it's just passed through from the object defined in the ActionMap. In the case of ActionManager.queryInputPath(...) both of the return values (query path value and input source info) need to be returned together since either value can change independently and it's the combination that is useful. I could change it to return a new object with two keyed values, but I'm not sure if that's better for the GC. So, I definitely hear you about wanting to avoid making new arrays on every tick, I just don't know how to change the API to improve it. I'm open for specific ideas or edits, though!

fernandojsg commented 6 years ago

That's a common problem and indeed not easy to solve. Depending on the use one solution could be to have a cached object created on the class that has the method that should return an object or an array and populate the data and return that object/array. As many of the functions are being called per frame I believe is it worth to look for alternatives on how the functions are being used to try to avoid that behaviour. In any case that's something we could do after this will get merged and move the discussion to another thread.

Thank you very much for the fast changes, I just went through them and they look fine.

Please let me test the api on the examples and I'll be back with more feedback on the ergonomics.

fernandojsg commented 6 years ago

@TrevorFSmith looking great! I've been looking around the example with the latest changes and I believe it's ready to merge. There're still some considerations as we discussed previously on how to return multiple values, but I believe we assume that the API will change in the following PRs as we keep using it. It could be nice to get some profiling on it later on as we have bigger examples so we could idenfity if these issues are hitting hard on the GC or performance.

Overall really nice job done here :+1: thanks!

/cc @johnshaughnessy @cvan any objection preventing this PR to get merged?

TrevorFSmith commented 6 years ago

Yes, I'm also assuming that the API will change. This is definitely a 0.1.0 instead of a 1.0!

TrevorFSmith commented 6 years ago

@fernandojsg Ok, it sounds like there are no objections. Time to merge?

fernandojsg commented 6 years ago

Yeahhh!! :)

fernandojsg commented 6 years ago

🎉🎉🎉🎉🎉

TrevorFSmith commented 6 years ago

Boom!

Thanks for all of your help, Fernando.

fernandojsg commented 6 years ago

Thank you for the amazing work you have done here! 👍

mozilla / action-input

ActionManager API for discussion #14