johnwdubois / rezonator

Rezonator: Dynamics of human engagement
34 stars 1 forks source link

Draw words in sync with audio playback #114

Open johnwdubois opened 5 years ago

johnwdubois commented 5 years ago

Background

  1. Seeing the words of a conversation drawn to the screen at the same time as you are hearing the audio can be useful for visualizing talk. This is key for doing experiments on splitting intonation units in a language you don't know.
  2. The intended effect is as if Rezonator "hears" the words as they are spoken, updating each token automatically to show in black on the main screen.

What to do

  1. Synchronize the drawing of words to the screen with the simultaneous playback of audio. (Call this Sync-Play.)
  2. To visualize Sync-Play, when the user is playing audio for a given unit, change the text color from grey to black for each token as it is played:
    • All tokens in units with a UnitEnd time earlier than the current playback time are shown in black
    • All tokens in units with a UnitStart value later than the current playback time are shown in grey
    • Only the currently playing Unit (or 2 or more overlapping Units; see below) has a mix of black text and grey text, updating dynamically
  3. To get the timestamps needed to sync the drawing of a word with the audio currently being heard, use one of 2 ways:
    • Estimate when the word is spoken based on the UnitStart time, UnitEnd tiime, and number of words in the current unit.
    • If available, use word-level timestamps provided in the original imported file
  4. Overlapping speech by two different speakers represents a special challenge, which must be addressed as follows:
    • Because overlapping words in 2 different units may occur at the same time, both should be updated (switch from grey to black) at the same time
    • Each unit should be updated according to its own timeline; so 2 (or more) timelines must be managed at the same time.
    • (For audio playback, avoid playing the same sound twice)
  5. Never use Sequence values when actual time values are available::
    • for Units: instead of using UnitSeq, use UnitStart and UnitEnd
    • for tokens: instead of using DocSeq, use UnitStart and UnitEnd, plus the Order value for the token within its Unit
    • for tokens: Only if no UnitStart and UnitEnd values are available, use UnitSeq, plus the Order value for the token within its Unit
  6. If timestamps are available at the word level, consider using those for drawing words.

Resources

  1. See resources for audio & video:
    • 1407

    • 1150

  2. See the GML asset Audio Visualizer.
  3. See also:

    112

    116

johnwdubois commented 2 years ago