Closed ghost closed 2 years ago
hi,
recognition result should contain duration and offset attributes, see here: https://github.com/jabber-tools/cognitive-services-speech-sdk-rs/blob/main/src/speech/speech_recognition_result.rs#L19-L20
right now these are defined as string (probably I should change to proper type) but it should work. Did you try it? Does it return these attributes?
Yes, I can use the offset and duration of the entire utterance. However, I would like to use each word and its offset and duration as shown below.
{
"Id": "791d3f8a724846f69e9d9256947d2479",
"RecognitionStatus": "Success",
"Offset": 500000,
"Duration": 13000000,
"DisplayText": "What's the weather like?",
"NBest": [
{
"Confidence": 0.97660327,
"Lexical": "what's the weather like",
"ITN": "what's the weather like",
"MaskedITN": "what's the weather like",
"Display": "What's the weather like?",
"Words": [
{
"Word": "what's",
"Offset": 500000,
"Duration": 3900000
},
{
"Word": "the",
"Offset": 4500000,
"Duration": 1300000
},
{
"Word": "weather",
"Offset": 5900000,
"Duration": 2900000
},
{
"Word": "like",
"Offset": 8900000,
"Duration": 4600000
}
]
},
According to https://github.com/Azure-Samples/cognitive-services-speech-sdk/issues/665, if I call RequestWordLevelTimestamps and set OutputFormat to Detailed, I can get the word level timestamp. https://github.com/jabber-tools/cognitive-services-speech-sdk-rs/blob/main/src/speech/speech_config.rs#L245-L250 https://github.com/jabber-tools/cognitive-services-speech-sdk-rs/blob/main/src/speech/speech_config.rs#L324-L335
I called RequestWordLevelTimestamps and set OutputFormat to Detailed, but I could not get the NBest. So to get the NBest, we need to add the NBest field here. https://github.com/jabber-tools/cognitive-services-speech-sdk-rs/blob/main/src/speech/speech_recognition_result.rs#L19-L20
hi
no need to enhance the struct SpeechRecognitionResult in any way. Just do exactly same as they advice in above mentioned issue 665, i.e.:
event.result.properties.get_property(PropertyId::SpeechServiceResponseJsonResult, "N/A")
Let me know should you have any problems with it, I just used one of provided examples to make this work with above mentioned tweaks.
I tried the above method and got the desired result. Thank you so much!
glad to help, closing the issue now.
https://github.com/Azure-Samples/cognitive-services-speech-sdk/issues/665 Hi, I'd like to use Word/phrase level timestamp as shown in issues above, is there any possibility to support it?