johnwdubois / rezonator

Rezonator: Dynamics of human engagement
35 stars 2 forks source link

Speaker label rule for IGT #691

Closed johnwdubois closed 3 years ago

johnwdubois commented 3 years ago

Background In many transcriptions, Speaker labels require special treatment. In effect, they are different from any other field that typically occurs in an Interlinear Glossed Text (IGT), whether in Scription or another convention. To ease the burden on researchers in fields such as language documentation, it is useful to recognize some basic realities about Speaker labels:

  1. In a pure monologue, the Speaker can be specified once at the Discourse level. In this case, no Speaker label markup is needed at the Unit or Utterance level (assuming the Speaker is listed elsewhere, such as in a header, or in a table or catalogue of the discourses in a corpus).
  2. But if there is any kind interaction between discourse participants at all (for example, in a conversation or interview), it is important to specify, for each utterance, which Speaker produced it.
  3. Nevertheless, in certain kinds of data (such as interviews), the Speaker may change only a few times during the Discourse. For example, there may be some back-and-forth exchanges at the beginning of the recorded Discourse, and a few more of the same kind at the end, but in the main body of the Discourse, the Speaker remains constant (doesn't change).
  4. For this kind of data, it is useful to have a special Speaker Rule for the markup and interpretation of Speaker identity in IGT data (such as Scription).

(NOTES:

  1. Other terms for Speaker that are commonly used are Participant and Agent.
  2. In the following, we follow the Scription convention which marks speaker labels using backslash "\sp". For details, see #665)

Speaker rule

  1. In an in an IGT file, identify the value for the CurrentSpeaker as follows:
    • When a backslash "\sp" Marker occurs as...
    • the first character (with no whitespace before it)
    • of the first line
    • of the first standard (non-header) Block
    • This string = CurrentSpeaker
    • And: Speaker = CurrentSpeaker
  2. If the next Block does NOT contain a "\sp" Marker, then carry over the previously assigned value for CurrentSpeaker. That is, for this Block Speaker = CurrentSpeaker
  3. If the next Block DOES contain a "\sp" Marker, take this as the new value for Speaker (and also for CurrentSpeaker)
  4. Continue testing each new Block, assigning a new value for CurrentSpeaker only when a new backslash "\sp" Marker appears.
  5. If the first standard (non-header) Block does not contain a Speaker marker, then no Block is allowed to have a Speaker marker. A file the lacks a Speaker marker on the first Block, but contains a Speaker marker on a later Block, is considered an invalid file. Show the user a message: "Invalid file: Inconsistent markers for Speaker".
  6. For IGT files that contain no markers (other than the Speaker marker), the standard Scription conventions for 2-line, 3-line, and 4-line should apply (that is, they remain unaffected by the presence or absence of Speaker markers).

Additional information

  1. The "\sp" marker is similar to the Note marker "\n" in Scription, which is optional (not required in every line)
  2. The Speaker Rule may require special treatment for data validation in Scription, because it must be evaluated at the Discourse level, not on a line-by-line (or Block-by-Block) basis.

Resources

  1. For a description of the use of "\sp" for speaker labels in Scription, see:
  2. See also #665
dwhieb commented 3 years ago

[scription] speaker label