HoloToolkit Input Module: design discussion

maxouellet commented 7 years ago

This is the initial suggested design for the new input module. Feel free to ask questions / request changes.

I already have a version of this implemented, to which I will make improvements as needed. Goal is to add it to HoloToolkit so that new features can be added by the community as needed. It has been designed to be as extensible as possible.

Why design a new Input Module?

A lot of applications have a need for an efficient way to handle input. In an ideal world, game objects can handle the input events they are interested in without having to write the same code in multiple components. The current solution in HoloToolkit isn't very performant and relies on Unity's messaging system, which is not very performant and doesn't make it obvious that behaviours are handling input events. It also doesn't extend well to support multiple different input mechanisms.

Unity provides an event system that it currently leverages to send out inputs to game objects. It contains a HoloLensInputModule that interprets HoloLens input and sends out the standard Unity events. The solution I propose does not use this input module, for the following reasons:

It does not easily support gaze stabilization: instead, it always assumes that you are targeting the center point of your camera, using a raycaster attached to your camera.
It only supports sending out the standard Unity events, which are 2D events. This lacks some of the key information that is needed to manipulate 3D objects with input devices that have more than 2 degrees of freedom.

The proposed design leverages Unity's event system, but implements its own version of an input module (called InputManager in this case). This currently gives us more flexibility in exploring various interaction mechanisms, which is very helpful when working with novel inputs devices such as what HoloLens provides.

Class diagram

Overall view of the main classes and interfaces that make up the initial version of this input module. Minor changes could still be done before submission (for example, HandsInput will likely be merged with GesturesInput, as their functionalities are very similar).

inputmodule

Input Module Design

The input module is designed to be extensible: it could support various input mechanisms and various types of gazers.

Each input source (hands, gestures, others) implement a IInputSource interface. The interface defines various events that the input sources can trigger. The input sources register themselves with the InputManager, whose role it is to forward input to the appropriate game objects. Input sources can be dynamically enabled / disabled as necessary, and new input sources can be created to support different input devices.

Game objects that want to consume input events can implement one or many input interfaces, such as:

IFocusable for focus enter and exit. The focus can be triggered by the user's gaze or any other gaze source.
IHoldHandle for the Windows hold gesture.
IInputHandler for source up, down and clicked. The source can be a hand that tapped, a clicker that was pressed, etc.
IManipulationHandler for the Windows manipulation gesture.
ISourceStateHandler to handle when an input source is detected or lost. This can be triggered when a hand comes into view of the HoloLens, for example.

The input manager listens to the various events coming from the input sources, and also takes into account the gaze. Currently, that gaze is always coming from the GazeManager class, but this could be extended to support multiple gaze sources if the need arises.

By default, input events are sent to the currently focused game object, if that object implements the appropriate interface. Modals input handlers can also be added to the input manager: these modal handlers will take priority over the currently focused object Fallback handlers can also be defined, so that the application can react to global inputs that aren't targeting a specific element. Any event sent by the input manager always bubbles up from the object to its ancestors.

In recap, the input manager forwards the various input sources events to the appropriate game object, using the following order:

The registered modal input handlers, in LIFO order of registration
The currently focused object
The fallback input handlers, in LIFO order of registration

The input manager also has a pointer to the currently active Cursor, allowing it to be accessed from there. The cursor currently also depends on the gaze coming from the GazeManager class.

StephenHodgson commented 7 years ago

LIFO?

maxouellet commented 7 years ago

Last In, First Out. It's a stack

genereddick commented 7 years ago

When will this be ready for testing?

maxouellet commented 7 years ago

Good question. I'm aiming for the week of the 24th. Current plan is to initially put this in a separate branch, give it a few weeks to communicate the breaking changes + stabilize as needed, and then RI into master.

I have this working locally on a few projects, but there's some cleanup I need to do beforehand to cleanly integrate it. Notably, this comes with some other utility fixes and additions that might have ripple effects, such as a modified Singleton that doesn't call FindByType.

StephenHodgson commented 7 years ago

@maxouellet Any News?

aalmada commented 7 years ago

Have you considered using Reactive Extensions (UniRx)? It allows compact and robust code specially when handling asynchronous inputs from multiple sources. Here is a reactive version of the GazeGestureManager for the Origami sample: https://github.com/aalmada/OrigamiRx/blob/master/Assets/Scripts/GazeGestureManager.cs

StephenHodgson commented 7 years ago

I was considering UniRx as well, but I didn't like that it was essentially using Linq.

Were you able to record definitive performance tests?

aalmada commented 7 years ago

The Linq performance issue was already raised in the Hololens forum and here is the clarification by the UniRx developer: https://forums.hololens.com/discussion/comment/7938/#Comment_7938

maxouellet commented 7 years ago

@HodgsonSDAS I'll likely get started on this today or tomorrow. I have all the code written for a first version, it's just a matter of cleaning it up so that i can publish it externally. My schedule is clearing up this afternoon, so hoping to have a first pull request out by end of the week,

@aalmada I have not considered UnirRx. My understanding is that we want to keep HoloToolkit as light and standalone as possible, and also make it simple to use for new developers. While I have no doubt that UniRx is great, I'm not convinced that we should add it as a mandatory dependency to the core HoloToolkit. That being said, this can definitely be brought up in a separate discussion if someone is willing to do the work. Maybe it could be an optional extension to HoloToolkit, or something like that.

NeerajW commented 7 years ago

I agree with Max. We should keep HoloToolkit strongly typed, easy to read code and extensible. Personally, I felt UnirRx did not meet the readable code criteria. I don't doubt its power but I'd rather not take a dependency on that less known design pattern. Interface/Singleton based patterns are understood by all. If not then they can be easy to ramp up to. Hope this helps!

darax commented 7 years ago

@maxouellet are you planning on switching back to the singleton that requires setting the instance in awake?

NeerajW commented 7 years ago

Max here are some questions/thoughts based on the interface diagram:

Gaze: Since we have named all classes so far related to Gaze to be Gaze*something does it makes sense to call IFocusable to be IGazeable and OnGazeEnter and OnGazeExit? This might have the transition easier as well for current code bases.

Gesture:

Would it make sense to define all interfaces for gestures? And not just Hold/Manipulation.
ISourceStateHandler Vs IInputHandler: Looks like these split out the UnityEngine.VR.WSA.Input.InteractionManager functionality. I think the names are confusing. Perhaps keep the interface names closer to the actual API? For example: It can be confusing to know which interface to use for what in this design. Unless I misunderstood something?

Voice:

I don't see much around that. Any thoughts?

aalmada commented 7 years ago

I totally agree with @maxouellet reasoning but not with @NeerajW . Rx is strongly typed, easier to read and easier to extend. For these same reasons so many people are now adopting functional languages. I already started working on a reactive version of HoloToolkit but I'm still trying to figure out the best way to take advantage of your work without replicating it. If it was natively reactive it would be much simpler and that's why I raised the issue. Thanks a lot for the feedback and for all the amazing work.

maxouellet commented 7 years ago

@darax Yes, but we have improvements to it so that it automatically sets itself up in Awake. There are two drawbacks to this, but at least it's better coding practice than doing FindByType across your whole scene.

If your singleton class needs to declare an Awake, it has to call base.Awake() to set the instance.
Singletons are only guaranteed to be set in Start(), so you should never call into a Singleton in Awake. This is consistent with the expectation that Awake is to set-up your class, and Start is to setup its dependencies and behaviors.

darax commented 7 years ago

var is the opposite of strongly typed. :)

Having a function like .RefCount is a hint that people working on it will need to understand object lifetime of Rx. This makes it harder to extend without introducing leaks or crashes.

I don't consider Linq to be more readable. The script you posted is hard for me to follow. What does .Publish() do? How does the focused object get set? I'd expect to see a raycast somewhere to set the focused object, but I don't see that.

I like where it looks like Max is going with the interfaces, and I'm sure that Rx can be wrapped around that once he's done.

darax commented 7 years ago

FindByType is only called once per singleton. I'm not sure why that's a problem.

StephenHodgson commented 7 years ago

I agree, Linq isn't explicitly typed and can be confusing to read at first, but the definitive answer lies in the hard data of performance testing.

StephenHodgson commented 7 years ago

@maxouellet how different is this new singleton generic? What was wrong with the current implementation?

maxouellet commented 7 years ago

I'll make it simpler by just sharing out what we're using right now.

    /// <summary>
    /// Singleton behaviour class, used for components that should only have one instance
    /// </summary>
    /// <typeparam name="T"></typeparam>
    public class Singleton<T> : MonoBehaviour where T : Singleton<T>
    {
        private static T instance;
        public static T Instance
        {
            get
            {
                return instance;
            }
        }

        /// <summary>
        /// Returns whether the instance has been initialized or not.
        /// </summary>
        public static bool IsInitialized
        {
            get
            {
                return instance != null;
            }
        }

        /// <summary>
        /// Base awake method that sets the singleton's unique instance.
        /// </summary>
        public virtual void Awake()
        {
            if (instance != null)
            {
                Debug.LogErrorFormat("Trying to instantiate a second instance of singleton class {0}", GetType().Name);
            }
            else
            {
                instance = (T) this;
            }
        }

        public virtual void OnDestroy()
        {
            if (instance == this)
            {
                instance = null;
            }
        }
    }

StephenHodgson commented 7 years ago

I like it, but we will need to make sure everyone calls the base classes when utilizing it. I wish there was a way to make calling the base class required.

maxouellet commented 7 years ago

You get a warning in the editor if you override Awake without specifying "override". Also, if you don't call base.Awake(), you will get a bunch of null exceptions every time you try to access the instance.

timGerken commented 7 years ago

should Awake and OnDestroy be marked as "protected" instead of "public"

StephenHodgson commented 7 years ago

What happens if someone doesn't call base.OnDestory()?

timGerken commented 7 years ago

I've also seen this pattern, in the (instance != null) case in Awake() call DestroyImmediate(this) just to be sure.

maxouellet commented 7 years ago

@timGerken Yes it should be protected, the reason it is public is that when this was written at a time where there was a bug in Unity where protected inheritance caused a runtime exception. I'd fix it before submitting :)

If you don't call base.OnDestroy, you have a dangling static reference, which I think will eventually be cleaned up by the garbage collector. Realistically, this is only useful in cases where you want to destroy a Singleton before your app closes. That's already the case with the existing Singleton.

I'm not sure there is a need to call DestroyImmediate in this case, but not strongly opposed to it either.

timGerken commented 7 years ago

Should this Singleton pattern survive Unity scene transitions? If so, in Awake, it should also call DontDestroyOnLoad(this);

StephenHodgson commented 7 years ago

@timGerken I think that should be optional. Generally this is a separate script you just attach to your parent object anyway.

StephenHodgson commented 7 years ago

@maxouellet I think documenting the required call to the base class should be good enough.

maxouellet commented 7 years ago

@NeerajW Gaze: It's purposefully not called Gaze because it isn't necessarily tied to gaze. Part of the goal of this design is to be flexible around input mechanisms, so I wouldn't want to strongly tie it to Gaze.

Gesture events: Yes. I think only Navigation was originally missing in there. The reason its missing is because a GestureRecognizer can only support either manipulation or Navigation. I'll likely add Navigation by creating a second GestureRecognizer for that purpose.

Merging ISourceStateHandler and IInputHandler: I don't have strong feelings about that, I think both sides could be argued. I originally made them separate because I'd rather have smaller interfaces than a big interface that forces you to have 5 methods even though you might only care about one of them. This is consistent with Unity's event system interfaces: in fact, they push it even more by having one interface per method.

Voice: This does not aim at addressing voice for now.

maxouellet commented 7 years ago

@timGerken It's not in there by design, because we've had cases where we only wanted a singleton for a single scene. I haven't played with it much, but what I've typically done is have a "DontDestroyOnLoad.cs" MonoBehaviour that calls DontDestroyOnLoad(this) in Start. That makes it easy to make any GameObject survive scene transitions, without having to add that logic to many of your scripts.

Otherwise you can end up with many singletons calling DontDestroyOnLoad on the same GameObject. Or you can end up with undesired behaviours when reusing components across projects. Making it optional like @HodgsonSDAS suggested would also work if people feel that's really valuable.

NeerajW commented 7 years ago

@maxouellet sounds good on the GGV discussion. Thanks for considering!

aalmada commented 7 years ago

@darax Not knowing Rx is one thing but saying the "var is the opposite of strongly typed" is shocking! This project needs some discussion forum. What about Gitter just like many other .NET project?

StephenHodgson commented 7 years ago

@aalmada is right. Technically all of C# is strongly typed anyway, but what we're really trying to say it's not explicit.

In general, if conflicting types can be decided at compile time (resulting in a compiler error), it's strongly typed. If it takes the runtime to discover it (resulting in an Exception), it's weakly typed.

MSDN for Linq Type Relationships:

LINQ query operations are strongly typed in the data source, in the query itself, and in the query execution.

MSDN for reference for var:

An implicitly typed local variable is strongly typed just as if you had declared the type yourself, but the compiler determines the type.

var i = 10; // implicitly typed

int i = 10; // explicitly typed

I think it's important to keep types explicit for easy readability, easily spot anonymous types, and faster code review during PRs.

paseb commented 7 years ago

@maxouellet and @NeerajW, I strongly agree with moving away from the term gaze to focus. Gaze has a single use target where focus has a broader understanding for multiple targets. Example: targeted interactions from the hand forward or other device. You can allow for multiple "focusers" where "gaze" implicitly has the understanding of originating from the head.

I largely agree with all of the Input changes though the cursor should also be abstracted accordingly. Having an interface -> abstract base class for input sources should also be used for cursors. Trying to model it after the shell cursor from the start doesn't provide the base for other cursors (sprites, material, lights... etc).

All in all awesome work @maxouellet!

StephenHodgson commented 7 years ago

@maxouellet any chance we get to play with this over the weekend?

maxouellet commented 7 years ago

@HodgsonSDAS Merging internal changes with the public version took more time than expected, but I'm back to a state that compiles. I still need to make sure all the tests and example scenes still work, and fix them as appropriate, which I expect is going to take most of today.

I'll try to have some early version (potentially without all example scenes working and with some changes still needed) up on my GitHub account today so that you can play with it. Will update this thread with my progress towards the end of the day.

maxouellet commented 7 years ago

I've pushed an early first version of this on my GitHub, so that you can play around with it. There's likely still a bunch of broken scenes, mainly references that broke when I deleted / moved / added when I moved from our internal project to the public HoloToolkit project. The best scene to try out the new features is the "InputManagerTest" scene, which shows most of the new functionalities.

I still need to clean up a lot of the code, such as adding the final file header, making the singleton methods protected and adding support for the navigation gesture. I'll also go through all the example scenes to make sure everything works before I send out an official pull request.

One deficiency that I've noticed is the ability to send voice commands to the focused object. I've fixed the test scenes that were doing that, but it's not as clean as I'd like it to be. I'm not 100% sure how to address these scenarios yet, so it will likely be addressed in a future update of that input module.

StephenHodgson commented 7 years ago

Overall. Wow, great work! I'm really excited to use this. I do have a few thoughts though.

I noticed in the GazeManager.cs that we're prioritizing hits during the update loop. Should we be using a for loop instead of foreach since this is hot code?

In BaseInputSource.cs in the Events wrappers region the RaiseSourceDetectedEvent and RaiseSourceLostEvent raises the opposite event. Is this intentional?

In Button.cs in Awake we set a hash to trigger the dehydrate animation:

Do we hydrate when we set the gameobject.active(true); in the animation?
Should we set gameobject.active(false); after we dehydrate?

InputTest.cs is missing the OnPointerEnter Debug line.

Other than that I think most everything else is cosmetic, like preferring explicit vars, protected overrides, and extra lines.

BTW, the popup menu test script is really cool. I'd almost say it belongs in the Holotoolkit.

maxouellet commented 7 years ago

Thanks a lot for the feedback @HodgsonSDAS . Most of what you mentioned are likely bugs, I'll look into them today.

I'm not super familiar with Button.cs, and might end up moving it to the Tests folder because I'm not convinced it's generic enough yet. The intend as I understand it is that the button states (including its active state) is completely controlled by its associated Animator state machine: the script mostly contains logic to toggle between these animations. The hash that is retrieved is just used to optimize modifying modifying the "Dehydrate" parameter on that animator (Unity docs).

One issues that was pointed out by @NeerajW is that InputManager currently allocates the event data classes at each frame. I'll try to take a look at this and see if I can somehow improve this.

StephenHodgson commented 7 years ago

@maxouellet Yeah the button class you had looks very similar to the one we had while working on our project at Microsoft earlier this year. We ended up making the hydrate and dehydrate animations completely automated based on the GameObject's active flag, but still included the option to hydrate in and out independently of the active status. One caveat though is that we had to explicitly call a public method that in turn triggered the dehydrate animation then deactivated the GameObject after it completed instead of just calling SetActive(false).

aalmada commented 7 years ago

@maxouellet Tried your version and it's a great improvement. No more script execution order issues.

On the explicit typing, I understand your reasoning but it's arguable: https://blogs.msdn.microsoft.com/ericlippert/2011/04/20/uses-and-misuses-of-implicit-typing/

StephenHodgson commented 7 years ago

However, the use of var does have at least the potential to make your code more difficult to understand for other developers. For that reason, the C# documentation generally uses var only when it is required.

-Implicitly Typed Local Variables (C# Programming Guide)

I think it's more important to make the code easier for others to understand because this is Open Source. From my understanding the HTK was initially for newbies who are just starting out in Unity & HoloLens.

maxouellet commented 7 years ago

I just pushed some more changes to my repository to fix the bugs that @HodgsonSDAS had pointed out, and do the additional fixes:

Added support for the Navigation event
Fixed the headers that were not in the MIT license format
InputManager no longer allocates for every event that is sent out
Removed old "Main Camera" prefab and associated camera movement script, since we now have the new HoloLensCamera prefab

Last thing I want to do before sending out an official pull request is to take a stab at merging HandsInput.cs and GesturesInput.cs, while would unify hands input with over other Windows-supported type of input.

StephenHodgson commented 7 years ago

Really digging that InputManagerTest scene.

There are some issues with the cursor animations though. I was able to trigger the wait animation and then get it stuck on the ring idle by triggering other animations. You may need to reset your triggers before setting them again if the animation gets cut off part way though completion.

I noticed the cursor rotates 180° on the X axis when you gaze off then on the two larger scaled button examples. Is this intentional to show that the cursor will always be visible above the button? If not, the objects cursor modifier normal z should be 0 not -1. The scale of the cursor and the scale of the buttons is misleading. Others may think that the scale of the button is what is affecting the scale of the cursor when it's just the cursor modifier doing all the work.

I noticed you have a snapping mechanism on the popup menu interaction. Should we describe that somewhere in the example as well?

I noticed that the cursor asset you dropped in is the one Greg Bahm made for project Carbon. It still contains animations and geometry specific to that project. You may want to remove the project specific components.

maxouellet commented 7 years ago

Good points.

We have a better implementation of the cursor coming in soon from @paseb, so the new cursor on my GitHub is really just a stand-in. I'll take a quick look at cleaning it up in the meantime, but odds are it simply won't be part of the pull request.

We could add some description in the example scene, but it's already pretty heavy on text. It's meant as a demo that shows the various capabilities of the input system and the cursor system, so it's kinda by design that users should explore the scene and look at the GameObjects to understand the behaviours they might be interested in having in their own project. My suggestion would be to leave it as-is for now, and we can always eventually add a ReadMe.md or something like that as necessary.

StephenHodgson commented 7 years ago

@maxouellet you also have two extra meta files that can be deleted from your repo. The Editor complains when I switch over to your branch each time, saying they've been deleted because there aren't any assets they're related to.

Removing Assets/ThirdParty/HoloToolkit/Input/Models/Cursor/cursor_hand_ready.fbm/templates because the asset does not exist
Removing Assets/ThirdParty/HoloToolkit/Input/Models/Cursor/cursor_ready.fbm/templates because the asset does not exist

maxouellet commented 7 years ago

Thanks for the notice, I pushed a change that removes these two files.

The input stuff is pretty much done at this point, minus the cursor. Goal is to get the cursor in today or tomorrow at latest, and send out a pull request once that's done. I also need to update the documentation, which I will do today.

aalmada commented 7 years ago

@maxouellet You don't need so many delegates for the input sources. Two are enough and you don't even need to declare them. Use Action<> in the event declaration as suggested in the .NET design guidelines for callbacks:

event Action<IInputSource, uint> SourceUp; event Action<IInputSource, uint, Vector3> ManipulationStarted;

These don't follow the .NET design guidelines for events but it would make the sender not strongly-typed.

maxouellet commented 7 years ago

@aalmada In this specific case, I'm not a big fan of using Action, because it hides the meaning of the uint. I can certainly change it if people feel it's more obvious when using Action.

It agree that EventHandler and event args for the various events would be better overall, but I was trying to avoid the allocation of an *EventArgs object instance on every interaction event, given that this will be running in Unity on HoloLens. I'm open to suggestions on this

aalmada commented 7 years ago

@maxouellet Come on! Everything that is not C# 1.0 is not readable? Generics is 2.0. C# is now 6.0 and Unity supports up to 4.0. At least you could reduce to only two delegates...

microsoft / MixedRealityToolkit-Unity