Open xiabingquan opened 3 years ago
Start and End = Time in seconds Start and End Offsets is literally what character position the word's first letter is and it's last letter appear in the transcript you supplied.
NAME | TYPE | RELATIONSHIP | PURPOSE |
---|---|---|---|
transcript | string | Top-level variable | Contains the full transcript plain-text exactly as you pasted it in (or it was generated by Gentle's automatic speech recognition) |
words | array | Top-level variable | Contains an array of word objects with timing and phoneme data for each word in the transcript |
word | string | Child of words[] | Current word, as written/capitalized in the transcript |
alignedWord | string | Child of words[] | Current word (all-lowercase) as stored in Gentle's pronunciation dictionary (<unk> means "unknown" i.e. the current word is not in Gentle's dictionary and gets rendered as OOV ("Out Of Vocabulary") in its phoneme readout space in the output HTML page.) |
case | string | Child of words[] | Indicates if the current word was successfully aligned (success ) or not found in the audio (not-found-in-audio ) |
start | number | Child of words[] | The start time of the current word in seconds |
end | number | Child of words[] | The end time of the current word in seconds |
startOffset | number | Child of words[] | The character offset in the transcript string where the current word begins |
endOffset | number | Child of words[] | The character offset in the transcript string where the current word ends |
phones | array | Child of words[] | Contains an array of phoneme objects for the current word |
phone | string | Child of phones[] | The ARPAbet phoneme label, which includes the phoneme name and a suffix indicating its position in the current word (_B for beginning, _I for inside, _E for end, _S for single-phoneme words) |
duration | number | Child of phones[] | The duration of the phoneme in seconds |
JSON files returned from gentle contains lots of keys like "start", "startOffset", "end", "endOffset" and others, could anyone tell me the exact meanings of the keys above? and how to compute the timing labels of words with them?