johnwdubois / rezonator

Rezonator: Dynamics of human engagement
33 stars 1 forks source link

Utterance stacker: Quick improvements #1431

Open kayaulai opened 1 year ago

kayaulai commented 1 year ago

JWD: See detailed comments below.

johnwdubois commented 1 year ago

My suggestion: (see #1446 )

  1. Keep the current utterance concatenation rule (which concatenates successive units by the same speaker into one utterance), with the following exceptions:
    • follow the concatenation rule as long as all units in a sequence are verbal (unitType = verbal), but NOT when they are non-verbal --see below
    • gapUnits < 6 (Otherwise, start a new utterance)
  2. Classify units as {verbal, laugh, pause, vocalism, annotation, other}.
    • If a unit contains at least one word (kind = word), then unitType = verbal
    • Else, if it contains a laugh, then unitType = laugh
    • Else, if it contains a pause or in-breath (or both), then unitType = pause
    • Else, if it contains a vocalism, then unitType = vocalism
    • Else, if it contains ONLY annotation (e.g. transcriber's comments, glosses, etc.), then unitType = annotation
    • Else, unitType = other
  3. Assign utteranceType based on the unitType:
    • if all units in an utterance are verbal (unitType = verbal), then utteranceType = verbal
  4. If a unit is nonverbal (not all utteranceType != verbal), then
    • if the next unit by the same participant has the same utteranceType, and gapUnits = 0, then extend the utterance to include it, and assign utteranceType to be the same as its component unitType value(s)
    • if the the next unit has a different utteranceType, end the utterance, and assign utteranceType to be the same as its component unitType value(s) (see #1446 )
kayaulai commented 1 year ago

I'm uncertain about using kind = word, because I fear that will make the stacker too SBC-specific.

johnwdubois commented 1 year ago

Point taken. Still, reference to "kind = word" is just one way to describe the algorithm/pseudocode. The same effect can be gotten by writing a little routine that does the same thing (presumably with a higher error rate, but all you really need is to recognize one word per IU to get the main benefit. (see #1446 )