Not understanding the returned results, is there a more thorough documentation?

xiabingquan commented 3 years ago

JSON files returned from gentle contains lots of keys like "start", "startOffset", "end", "endOffset" and others, could anyone tell me the exact meanings of the keys above? and how to compute the timing labels of words with them?

lilgandhi1199 commented 3 years ago

Start and End = Time in seconds Start and End Offsets is literally what character position the word's first letter is and it's last letter appear in the transcript you supplied.

natelawrence commented 4 months ago

NAME	TYPE	RELATIONSHIP	PURPOSE
transcript	string	Top-level variable	Contains the full transcript plain-text exactly as you pasted it in (or it was generated by Gentle's automatic speech recognition)
words	array	Top-level variable	Contains an array of word objects with timing and phoneme data for each word in the transcript
word	string	Child of words[]	Current word, as written/capitalized in the transcript
alignedWord	string	Child of words[]	Current word (all-lowercase) as stored in Gentle's pronunciation dictionary (`<unk>` means "unknown" i.e. the current word is not in Gentle's dictionary and gets rendered as OOV ("Out Of Vocabulary") in its phoneme readout space in the output HTML page.)
case	string	Child of words[]	Indicates if the current word was successfully aligned (`success`) or not found in the audio (`not-found-in-audio`)
start	number	Child of words[]	The start time of the current word in seconds
end	number	Child of words[]	The end time of the current word in seconds
startOffset	number	Child of words[]	The character offset in the transcript string where the current word begins
endOffset	number	Child of words[]	The character offset in the transcript string where the current word ends
phones	array	Child of words[]	Contains an array of phoneme objects for the current word
phone	string	Child of phones[]	The ARPAbet phoneme label, which includes the phoneme name and a suffix indicating its position in the current word (`_B` for beginning, `_I` for inside, `_E` for end, `_S` for single-phoneme words)
duration	number	Child of phones[]	The duration of the phoneme in seconds

lowerquality / gentle

Not understanding the returned results, is there a more thorough documentation? #292