SPIKE: Highest Value Facial / Gestural Signals

adonahue commented 5 years ago

As a PM, I know that turning on facial / gestural recognition in our software is a complicated feature. I also believe that capturing that data would allow us to develop more sophisticated analytics, that would add value to our product. I would like to understand which gestures (smiling, nodding, raising eyebrows, leaning in, etc.) are the most valuable in the short term for new Riff metrics, and if they require any other data to be usable (e.g. raising eyebrows while smiling.)

Our current implementation is client-side, and requires so much processing that it results in a poor UX. We chose to implement it client side because 1) it is inline with our privacy position to NOT record and store user data in a way that can reconstruct the content of their conversation, and 2) sending that data to a server for non-client-side processing, requires effectively adding another "peer" to the call in order to capture and send the data, which is also resource intensive.

Story Acceptance Criteria A written document that contains the following:

[ ] A recommendation of the most valuable gestures ranked highest to lowest, based on a review of the scientific literature of how those gestures can be used to verify certain behaviors or human dynamics. I.e. which gestures have the best science around them for proving how people interact.
[ ] A recommendation about which gestures would be most valuable for Riff to capture and use in new metrics, based on their scientific veracity, other data currently or easily available to Riff, and what claims Riff can make as a result. I.e. What's the most valuable thing for us to do currently as a business, and a justification for that recommendation.
[ ] Any tradeoffs between the frequency and quality of data gathered and the reliably detect the desired gesture.

jaedoucette commented 5 years ago

@adonahue I think this is close, but as written now, the use cases for the review are very broad (" used to verify certain behaviors or human dynamics"). I think that kind of scope could take weeks or months to do properly. It also depends on what quality of review we want. I can cover more if we're willing to accept a standard of "there is a model that might work well" versus identifying and understanding the state-of-the-art in these areas.

Some more specific use cases that I think could be reviewed: -> Determining emotion or affect from gestures -> Determining who is speaking from gestures -> Determining attention or focus from gestures -> Determining comfort, trust, or familiarity from gestures -> Determining conversational role from gestures

Can you add more about either which of those areas would be highest priority, or about what standard of review we're looking for?

ebporter commented 5 years ago

@jaedoucette @jordanreedie @adonahue The first step here might be just to see what's out there and available and come back and report to the team about what is already supported in javascript. Then, we can decide if any of the already implemented / supported libraries do things that we think are sufficiently interesting.

The criteria for deciding whether something might be a good candidate are: 1) Does something interesting (in the list of potential facial-gestural things we can detect) 2) Has good performance -- i.e. is demonstrated to be lightweight enough to be integrable into our client without bogging down the experience for the user 3) Isn't old, poorly written, or otherwise suspect as code and therefore unreliable (which is obviously a judgment call on the part of the developer)

So, this spike has two parts (and we'll probably just make a new one): 1) Discovery 2) Viability -- where viability is determining how to implement a library we find that's interesting

rifflearning / zenhub

SPIKE: Highest Value Facial / Gestural Signals #122