relientm96 / capstone2020

Gesture Recognition using Machine Learning. Documentation: https://relientm96.github.io/capstone2020/
0 stars 1 forks source link

Gesture Mapping Research (papers) #23

Closed nivlekp closed 4 years ago

nivlekp commented 4 years ago

Research into different strategies of mapping gesture to music. This issue should focus on papers published.

Y1Ck commented 4 years ago

how i do contribute to this section? create a feature branch to document the researched papers?

relientm96 commented 4 years ago

I would think a good way to do it is to just list it here as a comment? Cause then we can always refer back to this?

Then we can combine all information together into a single md document, summarizing each paper

nivlekp commented 4 years ago

Here is a summary from Tanaka. My comment is italicized.

Defining the instrument

Miranda and Wanderley describe an interactive musical instrument as being a system comprised of three basic subsystems:

I like how this model kind of matches out system that we drew a few weeks ago.

Mapping

Four types of mappings proposed by Hunt and Wanderley

From Mapping to Musical Phrases

To allow the articulation of musical phrases (including phrases in a music theory point of view, meta-events, and complex musical structures), Tanaka proposes a model of minimum three levels of mappings that work on conjunction to formulate the articulation of a musical unit. The proposed mapping types are:

  1. Binary mapping
  2. Basic parametric mapping(s)
  3. Expressive mapping(s)

An example would be violin, where the contact of the bow with the string is binary mapping, the selection of the length of string is basic parametric mapping, and vibrato is an expressive mappings. _more research/clarification is required, but for now, please refer to Tanaka for more detail._

To realise these three levels of mapping, Tanaka suggests using complex mappings or compound mappings.

Complex Mappings

One sensor input being processed in different ways (applying filters), to realise different levels of mapping (binary, basic, expressive). This would be categorised as "1 to many".

Compound Mappings

Multiple sensor inputs being processed to produce one musical events. It seems like he was implying that the relationship between the sensor inputs would determine which sensor input is being mapped to which level (binary, basic, expressive). More research is required (potentially with the keyword "Multimodal interaction". An example would be on a Nintendo Wii remote, the buttons, infrared camera and accelerometer are used to articulate a single musical event. This would be categorised as "Many to 1".

It seems like Compound mappings would suit our purpose.

nivlekp commented 4 years ago

Here is a summary from Rovan et al.. My comment is italicized.

Instrumental Gestural Mapping Strategies as Expressivity Determinants in Computer Music Performance

Mapping Strategies

A classification of mapping strategies (from gesture to sound production) into three groups is proposed:

_This paper seems to provide just a little bit of detail concerning three types of mapping as mentioned in Tanaka. However the rest of the paper analyses gestural mapping from a traditional instrument point of perspective (in this case mapping traditional clarinet playing gesture to a digital clarinet), and our application might not necessary be bounded by this. If necessary, we can revisit this paper._

nivlekp commented 4 years ago

Here is a summary from Modler. My comment is italicized.

Neural Networks for Mapping Hand Gestures to Sound Synthesis Parameters

This paper explores mapping hand gestures to musical parameters in an interactive music performance and virtual reality environment.

System Architecture

The interactive system comprises the following components.

User Input Device: The Sensor Glove

As a first approach, the author uses direct mapping of single sensor values to sound parameters _(this corresponds to one-to-one mapping in Rovel et al. and similar to what I did in the demo video). Although good results concerning the possibilities of controlling parameters of the sound algorithm (FM granular synthesis, analog synthesis) have been obtained, a drawback of such a direct connection would be the difficulty in the intuitive control of multiple parameters simultaneously. Therefore, the author fed the data from the sensor glove into a postprocessing unit which provides feature extraction and gesture recognition abilities (Since gestures from different fingers are coupled to form a single hand gesture, would this correspond to convergent mapping?)_.

The Concept of Symbolic and Parametric Subgestures

The author assumes that a gesture consists of subgestures of symbolic and parametric nature. The symbolic category may include static poses as well as non-static subgestures, including time-varying gestures which carry symbolic meaning for the performer. The parametric type should always be time-variant for the control of sound parameters. With this subgesture architecture sequences of gestures can be recognized using only a part of the hand as a significant symbolic sign, while other parts of the hand movement are used as a parametric sign. _(This seems to corresponds to "compound mapping" in Tanaka.)_

An example would be: a straightened index finger selects a sound object (symbolic subgesture) (whatever this means, e.g. a module in pure data that generates violin sound) and determines the volume and pitch of the object through the hand position (parametric subgesture).

The author uses a dedicated neural network architecture to extract time varying gestures (symbolic gestures), as well as using trajectories of certain sensors or overall hand velocity to extract parametric values (parametric gestures).

More coming later...