microsoft / cognitive-services-speech-sdk-js

Microsoft Azure Cognitive Services Speech SDK for JavaScript
Other
266 stars 100 forks source link

[Bug]: Honor Pad 9 (Android 13): Speech recognition fails on first attempt, requires double triggering of sttFromMic function #852

Open peterrookie opened 2 months ago

peterrookie commented 2 months ago

What happened?

This issue is for a: (mark with an x)

- [x] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

  1. Run the demo https://github.com/Azure-Samples/AzureSpeechReactSample/ on an Honor Pad 9 tablet running Android 13 with web browser (tried chrome, edge etc.).
  2. Use the tablet's default microphone (issue doesn't occur with Bluetooth headsets).
  3. Click the green button to start STT.
  4. Speak into the microphone.
  5. Observe that no recognition results are returned, despite the green mic icon in the top right corner indicating that the mic is working.
  6. Without stopping the first recognizer, trigger the sttFromMic function again.
  7. Observe that it now works correctly, receiving and recognizing speech input.

Note: If the previous recognizer is stopped before triggering sttFromMic again, the issue persists.

Any log messages given by the failure

No error messages are displayed in the console. Detailed event logs have been provided, showing the connection and recognition process, but no apparent errors.

Expected/desired behavior

The speech-to-text functionality should work correctly on the first trigger, as it does on other devices.

OS and Version?

Honor Pad 9, Android 13, Magic UI

Versions

Azure Speech SDK version: 1.38.0 "microsoft-cognitiveservices-speech-sdk": "^1.38.0" Also tried 1.35.0 1.32.0

Mention any other details that might be useful

  • The issue only occurs when using the tablet's default microphone. It works fine with Bluetooth headsets as the audio input device.
  • Most other devices do not exhibit this problem.
  • Attempts to resolve the issue:
    1. Tried delaying the call to startContinuousRecognitionAsync, thinking it might be a device initialization delay issue, but this was ineffective.
    2. Attempted to repeatedly call startContinuousRecognitionAsync without reinitializing the configuration, which was also unsuccessful.
    3. The only effective solution is to trigger the entire sttFromMic function twice.
  • The issue seems specific to this tablet model, as it works normally on other devices.

Version

1.35.0 (Default)

What browser/platform are you seeing the problem on?

Microsoft Edge

Relevant log output

2024-08-07T10:34:52.081Z | RecognitionTriggeredEvent | privName: RecognitionTriggeredEvent | privEventId: B633F2C4B13847E3AF5ABE554F0FECF6 | privEventTime: 2024-08-07T10:34:52.081Z | privEventType: 1 | privMetadata: {} | privRequestId: BAFA859E76CB457A8F02508FF44FD3FD | privSessionId: <NULL> | privAudioSourceId: 456E21B48B944059BF2AE5AF8EEE1002 | privAudioNodeId: 292C431419F84307A9DE851240E7BE26
2024-08-07T10:34:52.084Z | ConnectingToServiceEvent | privName: ConnectingToServiceEvent | privEventId: 2327B470763444D892C9529ADFE88710 | privEventTime: 2024-08-07T10:34:52.084Z | privEventType: 1 | privMetadata: {} | privRequestId: BAFA859E76CB457A8F02508FF44FD3FD | privSessionId: 1A635C602A524B8C9CB2599D36BCAFAE | privAuthFetchEventid: E5B6A653620A4935AA2F118CBC9BF9D9
2024-08-07T10:34:52.085Z | AudioStreamNodeAttachingEvent | privName: AudioStreamNodeAttachingEvent | privEventId: F46703A3825C4FF08868D59D95AF49A5 | privEventTime: 2024-08-07T10:34:52.085Z | privEventType: 1 | privMetadata: {} | privAudioSourceId: 456E21B48B944059BF2AE5AF8EEE1002 | privAudioNodeId: 292C431419F84307A9DE851240E7BE26
2024-08-07T10:34:52.092Z | AudioSourceInitializingEvent | privName: AudioSourceInitializingEvent | privEventId: ADCC2015673C430E8437478C012DE82C | privEventTime: 2024-08-07T10:34:52.092Z | privEventType: 1 | privMetadata: {} | privAudioSourceId: 456E21B48B944059BF2AE5AF8EEE1002
2024-08-07T10:34:52.095Z | ConnectionStartEvent | privName: ConnectionStartEvent | privEventId: BAEA1AC65DB24267A3199FBDC116D9FE | privEventTime: 2024-08-07T10:34:52.095Z | privEventType: 1 | privMetadata: {} | privConnectionId: 1A635C602A524B8C9CB2599D36BCAFAE | privUri: wss://eastasia.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1?language=zh-CN&format=simple&Ocp-Apim-Subscription-Key=ed24683843294c84b4e4a761ddb2529f&X-ConnectionId=1A635C602A524B8C9CB2599D36BCAFAE | privHeaders: <NULL>
2024-08-07T10:34:52.523Z | ConnectionEstablishedEvent | privName: ConnectionEstablishedEvent | privEventId: E2E34A16C5A048189A943181A1F9F54F | privEventTime: 2024-08-07T10:34:52.523Z | privEventType: 1 | privMetadata: {} | privConnectionId: 1A635C602A524B8C9CB2599D36BCAFAE
2024-08-07T10:34:52.526Z | RecognitionStartedEvent | privName: RecognitionStartedEvent | privEventId: 2BFC179689534BB3AB10BA3177C3D88F | privEventTime: 2024-08-07T10:34:52.526Z | privEventType: 1 | privMetadata: {} | privRequestId: BAFA859E76CB457A8F02508FF44FD3FD | privSessionId: 1A635C602A524B8C9CB2599D36BCAFAE | privAudioSourceId: 456E21B48B944059BF2AE5AF8EEE1002 | privAudioNodeId: 292C431419F84307A9DE851240E7BE26 | privAuthFetchEventId: E5B6A653620A4935AA2F118CBC9BF9D9
2024-08-07T10:34:52.636Z | AudioSourceReadyEvent | privName: AudioSourceReadyEvent | privEventId: 2190283FBE854DD0A1392626E90EAB01 | privEventTime: 2024-08-07T10:34:52.636Z | privEventType: 1 | privMetadata: {} | privAudioSourceId: 456E21B48B944059BF2AE5AF8EEE1002
2024-08-07T10:34:52.641Z | AudioStreamNodeAttachedEvent | privName: AudioStreamNodeAttachedEvent | privEventId: 0283CEB866B2438DAA0B3A451B90F575 | privEventTime: 2024-08-07T10:34:52.641Z | privEventType: 1 | privMetadata: {} | privAudioSourceId: 456E21B48B944059BF2AE5AF8EEE1002 | privAudioNodeId: 292C431419F84307A9DE851240E7BE26
2024-08-07T10:34:52.720Z | ListeningStartedEvent | privName: ListeningStartedEvent | privEventId: 6A7418F16E324FD5BD41CA2158D61B82 | privEventTime: 2024-08-07T10:34:52.720Z | privEventType: 1 | privMetadata: {} | privRequestId: BAFA859E76CB457A8F02508FF44FD3FD | privSessionId: 1A635C602A524B8C9CB2599D36BCAFAE | privAudioSourceId: 456E21B48B944059BF2AE5AF8EEE1002 | privAudioNodeId: 292C431419F84307A9DE851240E7BE26
2024-08-07T10:34:52.723Z | ConnectionMessageSentEvent | privName: ConnectionMessageSentEvent | privEventId: B1584D5095AF44F1B27D1177C004BF03 | privEventTime: 2024-08-07T10:34:52.723Z | privEventType: 1 | privMetadata: {} | privConnectionId: 1A635C602A524B8C9CB2599D36BCAFAE | privNetworkSentTime: 2024-08-07T10:34:52.723Z | privMessage: {"privBody":"{\"context\":{\"system\":{\"name\":\"SpeechSDK\",\"version\":\"1.32.0\",\"build\":\"JavaScript\",\"lang\":\"JavaScript\"},\"os\":{\"platform\":\"Browser/Linux armv81\",\"name\":\"Mozilla/5.0 (Linux; Android 13; HEY2-W09 Build/HONORHEY2-W09; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/116.0.0.0 Safari/537.36\",\"version\":\"5.0 (Linux; Android 13; HEY2-W09 Build/HONORHEY2-W09; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/116.0.0.0 Safari/537.36\"},\"audio\":{\"source\":{\"bitspersample\":16,\"channelcount\":1,\"connectivity\":\"Unknown\",\"manufacturer\":\"Speech SDK\",\"model\":\"\",\"samplerate\":16000,\"type\":\"Microphones\"}}},\"recognition\":\"conversation\"}","privMessageType":0,"privHeaders":{"Path":"speech.config","X-RequestId":"525C3A6402D84F8596784DA5A688F9AB","X-Timestamp":"2024-08-07T10:34:52.723Z","Content-Type":"application/json"},"privId":"3A96BD4EF7BB4A35B3216431A44762B9","privSize":643,"privPath":"speech.config","privRequestId":"525C3A6402D84F8596784DA5A688F9AB","privContentType":"application/json"}
2024-08-07T10:34:52.725Z | ConnectionMessageSentEvent | privName: ConnectionMessageSentEvent | privEventId: 9474C615430049DF8B4188D6367C9888 | privEventTime: 2024-08-07T10:34:52.725Z | privEventType: 1 | privMetadata: {} | privConnectionId: 1A635C602A524B8C9CB2599D36BCAFAE | privNetworkSentTime: 2024-08-07T10:34:52.725Z | privMessage: {"privBody":"{}","privMessageType":0,"privHeaders":{"Path":"speech.context","X-RequestId":"525C3A6402D84F8596784DA5A688F9AB","X-Timestamp":"2024-08-07T10:34:52.725Z","Content-Type":"application/json"},"privId":"C53F12D64995441F9E65BEAFEAD39F03","privSize":2,"privPath":"speech.context","privRequestId":"525C3A6402D84F8596784DA5A688F9AB","privContentType":"application/json"}
2024-08-07T10:34:52.726Z | ConnectionMessageSentEvent | privName: ConnectionMessageSentEvent | privEventId: 7BFD6786048A4506845C92A505909665 | privEventTime: 2024-08-07T10:34:52.726Z | privEventType: 1 | privMetadata: {} | privConnectionId: 1A635C602A524B8C9CB2599D36BCAFAE | privNetworkSentTime: 2024-08-07T10:34:52.726Z | privMessage: {"privBody":{},"privMessageType":1,"privHeaders":{"Path":"audio","X-RequestId":"525C3A6402D84F8596784DA5A688F9AB","X-Timestamp":"2024-08-07T10:34:52.726Z","Content-Type":"audio/x-wav"},"privId":"EC51C0BDD9014944A5B9F2B9D68CAF12","privSize":44,"privPath":"audio","privRequestId":"525C3A6402D84F8596784DA5A688F9AB","privContentType":"audio/x-wav"}
2024-08-07T10:34:52.842Z | ConnectionMessageReceivedEvent | privName: ConnectionMessageReceivedEvent | privEventId: BAE3584F71DA4AF9A28BA4BF68EF7859 | privEventTime: 2024-08-07T10:34:52.842Z | privEventType: 1 | privMetadata: {} | privConnectionId: 1A635C602A524B8C9CB2599D36BCAFAE | privNetworkReceivedTime: 2024-08-07T10:34:52.842Z | privMessage: {"privBody":"{\n \"context\": {\n \"serviceTag\": \"36d26497afe1419aa9f7110b5f36ff94\"\n }\n}","privMessageType":0,"privHeaders":{"x-requestid":"525C3A6402D84F8596784DA5A688F9AB","path":"turn.start","content-type":"application/json; charset=utf-8"},"privId":"9A5F4DCE30684984B59D935573A0204B","privSize":75}
BrianMouncer commented 2 months ago

@peterrookie When the first attempt fails, do you hit the error case? Could you expand the logging to find the error returned by the recognition? I am wondering if there is some kind of race condition with the auth server starting up, or some other delay that the basic sample is not setup to handle?

    recognizer.recognizeOnceAsync(result => {
        if (result.reason === ResultReason.RecognizedSpeech) {
            setDisplayText(`RECOGNIZED: Text=${result.text}`);
        } else {
            setDisplayText('ERROR: Speech was cancelled or could not be recognized. Ensure your microphone is working properly.');
        }