microsoft / cognitive-services-speech-sdk-js

Microsoft Azure Cognitive Services Speech SDK for JavaScript
Other
252 stars 91 forks source link

[Bug]: Real-Time Speech-to-Text Lag and Synchronization Problems on Low-Power Devices #806

Open hardik-veloxcore opened 3 months ago

hardik-veloxcore commented 3 months ago

I am facing issues with real-time speech-to-text on low-power devices. It works fine on devices with a good processor.

But when using Google Chrome with a CPU performance 6x lower than normal, the speech recognition lags and does not provide real-time text from the audio input. Screenshot 2024-04-02 181525

image

I have to click on 'stop listening' to get the text. This also affects the frames per second (fps) as the web socket continuously processes input from the audio.

The same issue occurs with real-time speech-to-text on https://speech.microsoft.com.

Version

1.33.1 (Default)

What browser/platform are you seeing the problem on?

Chrome

Relevant log output

X-RequestId:6F1B66FF7441470BB5FAD533C364D98A Path:turn.start Content-Type:application/json; charset=utf-8 { "context": { "serviceTag": "4247a50e319645de98ab0f2891b2af9b" } }   186 
18:22:22.659
X-RequestId:6F1B66FF7441470BB5FAD533C364D98A Path:speech.startDetected Content-Type:application/json; charset=utf-8 {"Offset":11200000} 140 
18:22:26.124
X-RequestId:6F1B66FF7441470BB5FAD533C364D98A Path:speech.hypothesis Content-Type:application/json; charset=utf-8 {"Id":"8d17fe9b2e1c4e4a8c007dc36e5a25c7","Text":"are you","Offset":11200000,"Duration":2800000,"PrimaryLanguage":{"Language":"en-us"},"Channel":0} 264 
18:22:26.211
X-RequestId:6F1B66FF7441470BB5FAD533C364D98A Path:speech.hypothesis Content-Type:application/json; charset=utf-8 {"Id":"0bd4832a28d840bf86640987a32ba715","Text":"are you rating","Offset":8400000,"Duration":6800000,"PrimaryLanguage":{"Language":"en-us"},"Channel":0}   270 
18:22:26.374
X-RequestId:6F1B66FF7441470BB5FAD533C364D98A Path:speech.hypothesis Content-Type:application/json; charset=utf-8 {"Id":"f6a2b00ab81241a7b8a45f83e587c763","Text":"are you rating my point","Offset":8400000,"Duration":12000000,"PrimaryLanguage":{"Language":"en-us"},"Channel":0} 280 
18:22:26.555
X-RequestId:6F1B66FF7441470BB5FAD533C364D98A Path:speech.hypothesis Content-Type:application/json; charset=utf-8 {"Id":"d82b5381f072432280caeb78ea13a65a","Text":"are you getting opponent","Offset":8400000,"Duration":12000000,"PrimaryLanguage":{"Language":"en-us"},"Channel":0}    281 
18:22:26.719
X-RequestId:6F1B66FF7441470BB5FAD533C364D98A Path:speech.phrase Content-Type:application/json; charset=utf-8 {"Id":"813b56e3fa8a446ba38f715cc63cbca7","RecognitionStatus":"Success","Offset":8400000,"Duration":12000000,"Channel":0,"DisplayText":"Are you getting opponent?","NBest":[{"Confidence":0.40714973,"Lexical":"are you getting opponent","ITN":"are you getting opponent","MaskedITN":"are you getting opponent","Display":"Are you getting opponent?","Words":[{"Word":"are","Offset":8400000,"Duration":1200000},{"Word":"you","Offset":9600000,"Duration":1200000},{"Word":"getting","Offset":11200000,"Duration":4000000},{"Word":"opponent","Offset":15600000,"Duration":4800000}]},{"Confidence":0.3220464,"Lexical":"are you rating my point","ITN":"are you rating my point","MaskedITN":"are you rating my point","Display":"are you rating my point","Words":[{"Word":"are","Offset":8400000,"Duration":1600000},{"Word":"you","Offset":10400000,"Duration":1200000},{"Word":"rating","Offset":11600000,"Duration":3600000},{"Word":"my","Offset":15200000,"Duration":1200000},{"Word":"point","Offset":16400000,"Duration":4000000}]}]} 1124    
18:22:26.979
X-RequestId:6F1B66FF7441470BB5FAD533C364D98A Path:speech.endDetected Content-Type:application/json; charset=utf-8 {"Offset":36000000}   138 
18:22:27.367
X-RequestId:6F1B66FF7441470BB5FAD533C364D98A Path:speech.phrase Content-Type:application/json; charset=utf-8 {"Id":"60b6172d4347483a999ebb01e3594763","RecognitionStatus":"EndOfDictation","Offset":36000000,"Duration":0,"Channel":0}  235 
18:22:27.409
X-RequestId:6F1B66FF7441470BB5FAD533C364D98A Path:turn.end Content-Type:application/json; charset=utf-8 { "StatusCode": "Success" }
glharper commented 3 months ago

@hardik-veloxcore Thank you for using Speech SDK, and writing this issue up. Using the attached sample html file, I ran a continuous recognition session on this test file with the CPU at 6x slowdown, but saw no issue with the recognition results returned.

Would you mind standing up a web page that demonstrates this issue at 6x slowdown with that file? index.html.zip

hardik-veloxcore commented 3 months ago

https://github.com/microsoft/cognitive-services-speech-sdk-js/assets/165795702/caa6ac12-87a8-44b5-aeb4-8ffed5c66ce7

I am attaching a video of the issue here so you can see what the problem is. The same issue happened in my application too.

ralph-msft commented 2 months ago

[B-7141286]

Hardik-Rana commented 1 month ago

Any updates on it?