phetsims / utterance-queue

Alerting library powered by aria-live
MIT License
0 stars 2 forks source link

iOS Safari often doesn't trigger start or end events on SpeechSynthesisUtterance #66

Open jessegreenberg opened 2 years ago

jessegreenberg commented 2 years ago

While working on related issue https://github.com/phetsims/number-play/issues/134#issuecomment-1032066605 and #65 I am seeing that the start and end event listeners frequenly fail to fire in iOS Safari. Causing an assertion in UtteranceQueue because the this.announcingUtteranceWrapper isn't cleared on the end event. And reading blocks do not highlight because the start event doesn't fire.

I tried re-introducing the safari workaround removed in https://github.com/phetsims/utterance-queue/issues/52, but it doesn't seem to fix it always. Though maybe it is better? It is hard to say, it seems to happen randomly.

jessegreenberg commented 2 years ago

I am not too interested in finding a way to "trick" Safari into always firing these events. It seems too unpredictable. Instead I am wondering if we can find a way to handle this gracefully and fallback to some behavior when we don't get these events.

jessegreenberg commented 2 years ago

Stream of consciousness notes to help think it through...

Sometimes we don't get a start event or an end event and synth.speaking is false. It is as though we never requested speech in SpeechSynthesisAnnouncer. But listeners are added in UtteranceQueue assuming they will be removed by the end event that we never get.

We need to gracefully handle these failure cases somehow, detecting when we request speech but speech never starts. This way the SpeechSynthesisAnnouncer can notify the UtteranceQueue that listeners should be removed.

Proposal: Keep track of when speech is requested in SpeechSynthesisAnnouncer. If synth.speaking is not true in a reasonable amount of time then we can assume there has been a failure and we will get no events nor start speaking. We cancel the synth and remove listeners. Something to think about is adding this failed utterance to the back of the queue again to try until we have a success.

EDIT: Although, if we keep track of the time before the start event instead of the time before synth.speaking becomes true then this workaround may also work for the ChromeOS issue in https://github.com/phetsims/number-play/issues/138#issuecomment-1032885116

jessegreenberg commented 2 years ago

The proposal is working really well so I'm going to commit it. The fundamental changes are 1) There is now a pendingUtterance on SpeechSynthesisAnnouncer that is set immediatelety in requestSpeech. It's existence well prevent the Announcer from setting readyToSpeak = true so that the UtteranceQueue cannot announce anything if we are still waiting for something to speak. It could take longer than VOICING_UTTERANCE_INTERVAL between the call to getSynth().speak() and the time getSynth().speaking becomes true. 2) If the amount of time between setting the pendingUtterance and the start event is greater than PENDING_UTTERANCE_DELAY we can assume there has been a failure of the engine and we cancel speech and remove listeners before the next attempt to speak.

jessegreenberg commented 2 years ago

Next, I want to fix #65 and then Ill be able to verify if this is fixed in number-play where it was originally found. Finally, Ill check and see if this is working in ChromeOS.

jessegreenberg commented 2 years ago

65 is resolved and number-play is working well in iOS Safari. The failure case is now being detected but not handled in ChromeOS. Lets continue that part of the issue in #64.

@zepumph could you please review the two commits here and the summary of these changes in https://github.com/phetsims/utterance-queue/issues/66#issuecomment-1065728444?

jessegreenberg commented 2 years ago

This implementation could change a bit to be more complete and support the needs of https://github.com/phetsims/joist/issues/782. More specifically, cancel and cancelUtterance need to also cancel the pendingUtterance.

jessegreenberg commented 2 years ago

In the above commit I made it so that we remove listeners on the pending utterance when cancel functions are called. Also changed pendingUtterance to pendingSpeechSynthesisUtteranceWrapper so that we have access to the listeners to remove.

I think this is better but I was wrong about this causing https://github.com/phetsims/joist/issues/782.

Ready for review again, sorry for the confusion let me know if you would like to talk about these changes.

zepumph commented 2 years ago

const PENDING_UTTERANCE_DELAY = 5000;

Can you please write a note about how 5 seconds was chosen, Even if you say "it is kinda random and it worked so I went with it", I think it will help us maintain the code better.

https://github.com/phetsims/utterance-queue/blob/0734faef02fd4e5f4f666cac26cf969c7b1a7767/js/SpeechSynthesisAnnouncer.ts#L597-L599

It would be nice to talk about this complexity, from the looks of it it appears we need to be graceful to this function being called multiple times? But I'm not sure why it (and cancel() has to look like this). More generally, I see pendingSpeechSynthesisUtteranceWrapper used 18 times in the file. That feels pretty extensive and I wonder if we can simplify it now that we understand the whole problem a bit more.

Maybe it would be best to discuss this further.

zepumph commented 2 years ago

Index: js/SpeechSynthesisAnnouncer.ts
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/js/SpeechSynthesisAnnouncer.ts b/js/SpeechSynthesisAnnouncer.ts
--- a/js/SpeechSynthesisAnnouncer.ts    (revision 366673277fbae0ca372bc55f6e6f56ea9e48002e)
+++ b/js/SpeechSynthesisAnnouncer.ts    (date 1654884892112)
@@ -332,7 +332,8 @@
       // start counting up until the synth has finished speaking its current utterance.
       this.timeSinceUtteranceEnd = synth.speaking ? 0 : this.timeSinceUtteranceEnd + dt;

-      this.timeSincePendingUtterance = this.pendingSpeechSynthesisUtteranceWrapper ? this.timeSincePendingUtterance + dt : 0;
+      this.timeSincePendingUtterance = this.speakingSpeechSynthesisUtteranceWrapper && this.speakingSpeechSynthesisUtteranceWrapper.pending ?
+                                       this.timeSincePendingUtterance + dt : 0;

       if ( this.timeSincePendingUtterance > PENDING_UTTERANCE_DELAY ) {
         assert && assert( this.pendingSpeechSynthesisUtteranceWrapper, 'should have this.pendingSpeechSynthesisUtteranceWrapper' );
@@ -350,7 +351,7 @@
       // see documentation for the constant for more information. By setting readyToAnnounce in the step function
       // we also don't have to rely at all on the SpeechSynthesisUtterance 'end' event, which is inconsistent on
       // certain platforms. Also, not ready to announce if we are waiting for the synth to start speaking something.
-      if ( this.timeSinceUtteranceEnd > VOICING_UTTERANCE_INTERVAL && !this.pendingSpeechSynthesisUtteranceWrapper ) {
+      if ( this.timeSinceUtteranceEnd > VOICING_UTTERANCE_INTERVAL && !( this.speakingSpeechSynthesisUtteranceWrapper && this.speakingSpeechSynthesisUtteranceWrapper.pending ) ) {
         this.readyToAnnounce = true;
       }

@@ -489,7 +490,7 @@
       // Important that the pendingSpeechSynthesisUtteranceWrapper is cleared in the start event instead of when `synth.speaking` is
       // set to true because `synth.speaking` is incorrectly set to true before there is successful speech in ChromeOS.
       // See https://github.com/phetsims/utterance-queue/issues/66 and https://github.com/phetsims/utterance-queue/issues/64
-      this.pendingSpeechSynthesisUtteranceWrapper = null;
+      this.speakingSpeechSynthesisUtteranceWrapper.pending = false;
       this.currentlySpeakingUtterance = utterance;

       // Interrupt if the Utterance can no longer be announced.
@@ -536,9 +537,6 @@
     // See https://github.com/phetsims/utterance-queue/issues/40
     this.timeSinceUtteranceEnd = 0;

-    // Utterance is pending until we get a successful 'start' event on the SpeechSynthesisUtterance
-    this.pendingSpeechSynthesisUtteranceWrapper = speechSynthesisUtteranceWrapper;
-
     this.getSynth()!.speak( speechSynthUtterance );
   }

@@ -579,7 +577,6 @@
   cancel(): void {
     if ( this.initialized ) {
       const utteranceToCancel = this.speakingSpeechSynthesisUtteranceWrapper ? this.speakingSpeechSynthesisUtteranceWrapper.utterance :
-                                this.pendingSpeechSynthesisUtteranceWrapper ? this.pendingSpeechSynthesisUtteranceWrapper.utterance :
                                 null;

       if ( utteranceToCancel ) {
@@ -594,9 +591,9 @@
    * (utterance-queue internal)
    */
   override cancelUtterance( utterance: Utterance ): void {
-    const utteranceWrapperToEnd = utterance === this.currentlySpeakingUtterance ? this.speakingSpeechSynthesisUtteranceWrapper :
-                                  ( this.pendingSpeechSynthesisUtteranceWrapper && utterance === this.pendingSpeechSynthesisUtteranceWrapper.utterance ) ? this.pendingSpeechSynthesisUtteranceWrapper :
-                                  null;
+    const utteranceWrapperToEnd = this.speakingSpeechSynthesisUtteranceWrapper ?
+                                  this.speakingSpeechSynthesisUtteranceWrapper.utterance === utterance ?
+                                  this.speakingSpeechSynthesisUtteranceWrapper : null : null;

     if ( utteranceWrapperToEnd ) {
       this.handleSpeechSynthesisEnd( utteranceWrapperToEnd.utterance.getAlertText(), utteranceWrapperToEnd );
jessegreenberg commented 2 years ago

Coming back to this, remaining work:

jessegreenberg commented 2 years ago

I got blocked by #84 last time I worked on https://github.com/phetsims/utterance-queue/issues/66#issuecomment-1185752283, Ill try finish this next.