Combine pendingSpeechSynthesisUtteranceWrapper and currentlySpeakingUtterance in SpeechSynthesisAnnouncer

jessegreenberg commented 1 year ago

We have a pendingSpeechSynthesisUtteranceWrapper - with a reference to an Utterance that was just used to speak, but we are waiting for the browser to actually speak it. And the currentlySpeakingUtterance, a reference to the Utterance that is actually being spoken by the SpeechSynthesisAnnouncer.

@zepumph and I think that we only need a reference to one at a time, and that managing both is really complicating things. See https://github.com/phetsims/ratio-and-proportion/issues/533 and the changes in that issue that would be so much simpler if there were only one Utterance to worry about.

zepumph commented 1 year ago

Work in progress but many manualInput unit tests are failing. We ran out of time to investigate fully.

```diff Index: scenery/js/accessibility/voicing/voicingManager.ts IDEA additional info: Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP <+>UTF-8 =================================================================== diff --git a/scenery/js/accessibility/voicing/voicingManager.ts b/scenery/js/accessibility/voicing/voicingManager.ts --- a/scenery/js/accessibility/voicing/voicingManager.ts (revision 17bb534c507613a8103e5c7b981dbfacddcc4866) +++ b/scenery/js/accessibility/voicing/voicingManager.ts (date 1670000100677) @@ -26,7 +26,7 @@ // All VoicingManager instances should respect responseCollector's current state. respectResponseCollectorProperties: true, - + debug: true, // phet-io tandem: Tandem.OPTIONAL, phetioDocumentation: 'Announcer that manages the voicing feature, providing audio responses via WebAudio.' Index: utterance-queue/js/SpeechSynthesisAnnouncer.ts IDEA additional info: Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP <+>UTF-8 =================================================================== diff --git a/utterance-queue/js/SpeechSynthesisAnnouncer.ts b/utterance-queue/js/SpeechSynthesisAnnouncer.ts --- a/utterance-queue/js/SpeechSynthesisAnnouncer.ts (revision 07e4e946b9530cc73c30e1cd71a05cc36c716300) +++ b/utterance-queue/js/SpeechSynthesisAnnouncer.ts (date 1670000199788) @@ -197,6 +197,13 @@ // A references is kept so that we can remove listeners // from the SpeechSynthesisUtterance when the voicingManager finishes speaking the Utterance. + // Only public for unit tests! A reference to the utterance currently in the synth + // being spoken by the browser, so we can determine cancelling behavior when it is time to speak the next utterance. + // See voicing's supported announcerOptions for details. + // A reference to the Utterance that is about to be spoken. Cleared the moment + // speech starts (the start event of the SpeechSynthesisUtterance). Depending on the platform there may be + // a delay between the speak() call and when the synth actually starts speaking. + // TODO: private speakingSpeechSynthesisUtteranceWrapper: SpeechSynthesisUtteranceWrapper | null; // is the VoicingManager initialized for use? This is prototypal so it isn't always initialized @@ -214,16 +221,6 @@ // Bound so that the listener can be added and removed on Utterances without creating many closures. private readonly boundHandleCanAnnounceChange: ( canAnnounce: boolean ) => void; - // Only public for unit tests! A reference to the utterance currently in the synth - // being spoken by the browser, so we can determine cancelling behavior when it is time to speak the next utterance. - // See voicing's supported announcerOptions for details. - private currentlySpeakingUtterance: Utterance | null; - - // A reference to the Utterance that is about to be spoken. Cleared the moment - // speech starts (the start event of the SpeechSynthesisUtterance). Depending on the platform there may be - // a delay between the speak() call and when the synth actually starts speaking. - private pendingSpeechSynthesisUtteranceWrapper: SpeechSynthesisUtteranceWrapper | null; - // Switch to true to enable debugging features (like logging) private readonly debug: boolean; @@ -312,12 +309,10 @@ this.canSpeakProperty = null; this.boundHandleCanSpeakChange = this.handleCanSpeakChange.bind( this ); this.boundHandleCanAnnounceChange = this.handleCanAnnounceChange.bind( this ); - this.currentlySpeakingUtterance = null; - this.pendingSpeechSynthesisUtteranceWrapper = null; if ( this.debug ) { this.announcementCompleteEmitter.addListener( ( utterance, string ) => { - console.log( 'announcement complete', string ); + console.log( 'announcement complete', string, new Error().stack ); } ); this.startSpeakingEmitter.addListener( string => { this.debug && console.log( 'startSpeakingListener', string ); @@ -416,14 +411,16 @@ // start counting up until the synth has finished speaking its current utterance. this.timeSinceUtteranceEnd = synth.speaking ? 0 : this.timeSinceUtteranceEnd + dt; - this.timeSincePendingUtterance = this.pendingSpeechSynthesisUtteranceWrapper ? this.timeSincePendingUtterance + dt : 0; + + this.timeSincePendingUtterance = ( this.speakingSpeechSynthesisUtteranceWrapper && !this.speakingSpeechSynthesisUtteranceWrapper.started ) ? + this.timeSincePendingUtterance + dt : 0; if ( this.timeSincePendingUtterance > PENDING_UTTERANCE_DELAY ) { - assert && assert( this.pendingSpeechSynthesisUtteranceWrapper, 'should have this.pendingSpeechSynthesisUtteranceWrapper' ); + assert && assert( this.speakingSpeechSynthesisUtteranceWrapper, 'should have this.speakingSpeechSynthesisUtteranceWrapper' ); // It has been too long since we requested speech without speaking, the synth is likely failing on this platform - this.handleSpeechSynthesisEnd( this.pendingSpeechSynthesisUtteranceWrapper!.announceText, this.pendingSpeechSynthesisUtteranceWrapper! ); - this.pendingSpeechSynthesisUtteranceWrapper = null; + this.handleSpeechSynthesisEnd( this.speakingSpeechSynthesisUtteranceWrapper!.announceText, this.speakingSpeechSynthesisUtteranceWrapper! ); + this.speakingSpeechSynthesisUtteranceWrapper = null; // cancel the synth because we really don't want it to keep trying to speak this utterance after handling // the assumed failure @@ -434,7 +431,7 @@ // see documentation for the constant for more information. By setting readyToAnnounce in the step function // we also don't have to rely at all on the SpeechSynthesisUtterance 'end' event, which is inconsistent on // certain platforms. Also, not ready to announce if we are waiting for the synth to start speaking something. - if ( this.timeSinceUtteranceEnd > VOICING_UTTERANCE_INTERVAL && !this.pendingSpeechSynthesisUtteranceWrapper ) { + if ( this.timeSinceUtteranceEnd > VOICING_UTTERANCE_INTERVAL && !this.speakingSpeechSynthesisUtteranceWrapper ) { this.readyToAnnounce = true; } @@ -582,14 +579,8 @@ const startListener = () => { this.startSpeakingEmitter.emit( stringToSpeak, utterance ); - // Important that the pendingSpeechSynthesisUtteranceWrapper is cleared in the start event instead of when `synth.speaking` is - // set to true because `synth.speaking` is incorrectly set to true before there is successful speech in ChromeOS. - // See https://github.com/phetsims/utterance-queue/issues/66 and https://github.com/phetsims/utterance-queue/issues/64 - this.pendingSpeechSynthesisUtteranceWrapper = null; - this.currentlySpeakingUtterance = utterance; - - assert && assert( this.speakingSpeechSynthesisUtteranceWrapper === null, 'Wrapper should be null, we should have received an end event to clear it.' ); - this.speakingSpeechSynthesisUtteranceWrapper = speechSynthesisUtteranceWrapper; + assert && assert( this.speakingSpeechSynthesisUtteranceWrapper, 'should have been set in requestSpeech' ); + this.speakingSpeechSynthesisUtteranceWrapper!.started = true; speechSynthUtterance.removeEventListener( 'start', startListener ); }; @@ -598,9 +589,6 @@ this.handleSpeechSynthesisEnd( stringToSpeak, speechSynthesisUtteranceWrapper ); }; - speechSynthUtterance.addEventListener( 'start', startListener ); - speechSynthUtterance.addEventListener( 'end', endListener ); - // Keep a reference to the SpeechSynthesisUtterance and the endListener so that we can remove the listener later. // Notice this is used in the function scopes above. // IMPORTANT NOTE: Also, this acts as a workaround for a Safari bug where the `end` event does not fire @@ -608,7 +596,13 @@ // will fail to emit that event. See // https://stackoverflow.com/questions/23483990/speechsynthesis-api-onend-callback-not-working and // https://github.com/phetsims/john-travoltage/issues/435 and https://github.com/phetsims/utterance-queue/issues/52 - const speechSynthesisUtteranceWrapper = new SpeechSynthesisUtteranceWrapper( utterance, announceText, speechSynthUtterance, endListener ); + const speechSynthesisUtteranceWrapper = new SpeechSynthesisUtteranceWrapper( utterance, announceText, speechSynthUtterance, false, endListener ); + + assert && assert( this.speakingSpeechSynthesisUtteranceWrapper === null, 'Wrapper should be null, we should have received an end event to clear it before the next one.' ); + this.speakingSpeechSynthesisUtteranceWrapper = speechSynthesisUtteranceWrapper; + + speechSynthUtterance.addEventListener( 'start', startListener ); + speechSynthUtterance.addEventListener( 'end', endListener ); // In Safari the `end` listener does not fire consistently, (especially after cancel) // but the error event does. In this case signify that speaking has ended. @@ -623,9 +617,6 @@ // See https://github.com/phetsims/utterance-queue/issues/40 this.timeSinceUtteranceEnd = 0; - // Utterance is pending until we get a successful 'start' event on the SpeechSynthesisUtterance - this.pendingSpeechSynthesisUtteranceWrapper = speechSynthesisUtteranceWrapper; - // Interrupt if the Utterance can no longer be announced. If this happens before the "pending" utterance can // start speaking, it will never speak. If it happens while the currentlySpeakingUtterance is speaking, it will // be interrupted mid-speech. @@ -639,11 +630,8 @@ * When a canAnnounceProperty changes to false for an Utterance, that utterance should be cancelled. */ private handleCanAnnounceChange(): void { - if ( this.currentlySpeakingUtterance ) { - this.cancelUtteranceIfCanAnnounceFalse( this.currentlySpeakingUtterance ); - } - if ( this.pendingSpeechSynthesisUtteranceWrapper ) { - this.cancelUtteranceIfCanAnnounceFalse( this.pendingSpeechSynthesisUtteranceWrapper.utterance ); + if ( this.speakingSpeechSynthesisUtteranceWrapper ) { + this.cancelUtteranceIfCanAnnounceFalse( this.speakingSpeechSynthesisUtteranceWrapper.utterance ); } } @@ -678,8 +666,6 @@ } this.speakingSpeechSynthesisUtteranceWrapper = null; - this.pendingSpeechSynthesisUtteranceWrapper = null; - this.currentlySpeakingUtterance = null; } /** @@ -697,13 +683,7 @@ */ public cancel(): void { if ( this.initialized ) { - const utteranceToCancel = this.speakingSpeechSynthesisUtteranceWrapper ? this.speakingSpeechSynthesisUtteranceWrapper.utterance : - this.pendingSpeechSynthesisUtteranceWrapper ? this.pendingSpeechSynthesisUtteranceWrapper.utterance : - null; - - if ( utteranceToCancel ) { - this.cancelUtterance( utteranceToCancel ); - } + this.speakingSpeechSynthesisUtteranceWrapper && this.cancelUtterance( this.speakingSpeechSynthesisUtteranceWrapper.utterance ); } } @@ -713,12 +693,11 @@ * (utterance-queue internal) */ public override cancelUtterance( utterance: Utterance ): void { - const utteranceWrapperToEnd = utterance === this.currentlySpeakingUtterance ? this.speakingSpeechSynthesisUtteranceWrapper : - ( this.pendingSpeechSynthesisUtteranceWrapper && utterance === this.pendingSpeechSynthesisUtteranceWrapper.utterance ) ? this.pendingSpeechSynthesisUtteranceWrapper : - null; - if ( utteranceWrapperToEnd ) { - this.handleSpeechSynthesisEnd( utteranceWrapperToEnd.announceText, utteranceWrapperToEnd ); + const wrapper = this.speakingSpeechSynthesisUtteranceWrapper; + + if ( wrapper && utterance === wrapper.utterance ) { + this.handleSpeechSynthesisEnd( wrapper.announceText, wrapper ); // silence all speech - after handleSpeechSynthesisEnd so we don't do that work twice in case `cancelSynth` // also triggers end events immediately (but that doesn't happen on all browsers) @@ -756,14 +735,10 @@ */ public override onUtterancePriorityChange( nextAvailableUtterance: Utterance ): void { - // test against what is currently being spoken by the synth (currentlySpeakingUtterance) - if ( this.currentlySpeakingUtterance && this.shouldUtteranceCancelOther( nextAvailableUtterance, this.currentlySpeakingUtterance ) ) { - this.cancelUtterance( this.currentlySpeakingUtterance ); - } - - // test against what is pending to be spoken by the synth (pendingSpeechSynthesisUtteranceWrapper) - if ( this.pendingSpeechSynthesisUtteranceWrapper && this.shouldUtteranceCancelOther( nextAvailableUtterance, this.pendingSpeechSynthesisUtteranceWrapper.utterance ) ) { - this.cancelUtterance( this.pendingSpeechSynthesisUtteranceWrapper.utterance ); + // test against what is currently being spoken by the synth + const wrapper = this.speakingSpeechSynthesisUtteranceWrapper; + if ( wrapper && this.shouldUtteranceCancelOther( nextAvailableUtterance, wrapper.utterance ) ) { + this.cancelUtterance( wrapper.utterance ); } } @@ -778,8 +753,8 @@ /** * Returns true if SpeechSynthesis is available on the window. This check is sufficient for all of - * voicingManager. On platforms where speechSynthesis is available, all features of it are available, with the - * exception of the onvoiceschanged event in a couple of platforms. However, the listener can still be set + * voicingManager. On platforms where speechSynthesis is available, all features of it are available, except for the + * onvoiceschanged event in a couple of platforms. However, the listener can still be set * without issue on those platforms so we don't need to check for its existence. On those platforms, voices * are provided right on load. */ @@ -797,6 +772,7 @@ public constructor( public readonly utterance: Utterance, public readonly announceText: ResolvedResponse, public readonly speechSynthesisUtterance: SpeechSynthesisUtterance, + public started: boolean, public readonly endListener: () => void ) { } } Index: scenery/js/accessibility/voicing/voicingUtteranceQueue.ts IDEA additional info: Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP <+>UTF-8 =================================================================== diff --git a/scenery/js/accessibility/voicing/voicingUtteranceQueue.ts b/scenery/js/accessibility/voicing/voicingUtteranceQueue.ts --- a/scenery/js/accessibility/voicing/voicingUtteranceQueue.ts (revision 17bb534c507613a8103e5c7b981dbfacddcc4866) +++ b/scenery/js/accessibility/voicing/voicingUtteranceQueue.ts (date 1670000100680) @@ -12,7 +12,8 @@ import { scenery, voicingManager } from '../../imports.js'; const voicingUtteranceQueue = new UtteranceQueue( voicingManager, { - featureSpecificAnnouncingControlPropertyName: 'voicingCanAnnounceProperty' + featureSpecificAnnouncingControlPropertyName: 'voicingCanAnnounceProperty', + debug:true } ); // voicingUtteranceQueue should be disabled until requested

jessegreenberg commented 1 year ago

Alright, the only reason the unit tests were failing is because we needed to remove the references to the currentlySpeakingUtterance. I cleaned up the patch and pushed. Once I got the tests passing in the browser I saw them fail in precommit hooks a few times but then they started to pass and I don't know why. Not sure what to do about it but thought I would mention.

@zepumph would you like to review the final commit?

zepumph commented 1 year ago

Perfect! Thank you. I do not think this needs cherry picking over in https://github.com/phetsims/ratio-and-proportion/issues/533, but this is definitely an improvement. Closing

zepumph commented 1 year ago

Oops, it looks like we have a bug. Click the audio enabled button twice in a row. I think we need to mess with speakIgnoringEnabled from this change.

Assertion Failed: Wrapper should be null, we should have received an end event to clear it before the next one.

zepumph commented 1 year ago

I discussed with @jessegreenberg and we feel good. Closing

zepumph commented 1 year ago

There seems to be one more race condition that we don't have a handle yet. I cannot reproduce, but it seems to be directly related to this issue.

I think I caused it like so:

Open preferences and turn voicing on
click out of preferences and quickly open keyboard help dialog

The race condition is that the addToBack of "keyboard shortcuts" occurs before the startListener of preferences, so it is cancelled, and then the startListener asserts out because it no longer has a speakingSpeechSynthesisUtteranceWrapper.

Here is some data from my runtime before I refresh and lose it:

From the console: synth is speaking, next utterance in the queue is the "keyboard" but speakingSpeechSynthesisUtteranceWrapper is null because we have already called cancel (clearing before we get the startListener callback).

Should we just be graceful in the startlistener? Or do we want to try to code around this case?

jessegreenberg commented 1 year ago

I just hit this while trying to fix https://github.com/phetsims/friction/issues/314. Here is the patch where this happens consistently. Just open and close the Preferences dialog a few times with a keyboard. Doing it quickly is important so that we get the cancel before we get the start listener.

Index: js/Sim.ts
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/js/Sim.ts b/js/Sim.ts
--- a/js/Sim.ts (revision 1c4aa3e3661068b54d1e623a1c33cf134aa1188d)
+++ b/js/Sim.ts (date 1672849988249)
@@ -820,8 +820,10 @@
       // pdom - modal dialogs should be the only readable content in the sim
       this.setPDOMViewsVisible( false );

-      // voicing - responses from Nodes hidden by the modal dialog should not voice.
+      // voicing - Responses from Nodes hidden by the modal dialog should not voice. Contents of the popup
+      // should voice if "Sim Voicing" is enabled.
       this.setNonModalVoicingVisible( false );
+      this.topLayer.voicingVisibleProperty.value = voicingManager.voicingFullyEnabledProperty.value;
     }
     if ( popup.layout ) {
       popup.layout( this.screenBoundsProperty.value! );
@@ -842,8 +844,9 @@
       if ( this.modalNodeStack.length === 0 ) {

         // After hiding all popups, Voicing becomes enabled for components in the simulation window only if
-        // "Sim Voicing" switch is on.
+        // "Sim Voicing" switch is on. C
         this.setNonModalVoicingVisible( voicingManager.voicingFullyEnabledProperty.value );
+        this.topLayer.voicingVisibleProperty.value = false;

         // pdom - when the dialog is hidden, make all ScreenView content visible to assistive technology
         this.setPDOMViewsVisible( true );

jessegreenberg commented 1 year ago

synth is speaking, next utterance in the queue is the "keyboard" but speakingSpeechSynthesisUtteranceWrapper is null because we have already called cancel (clearing before we get the startListener callback).

Thank you for investigating this, that makes sense. It looks like we aren't removing the startListener on cancel (if it hasn't fired yet), I think we should be doing that.

jessegreenberg commented 1 year ago

I believe that the above commit fixes this. I am not committing to the joist change above but used it to verify that the problem is gone. @zepumph can you please review this change?

zepumph commented 1 year ago

Looks really good! I also cherry picked this into RAP 1.2 and confirmed it was fixed. Thanks!

phetsims / utterance-queue

Combine pendingSpeechSynthesisUtteranceWrapper and currentlySpeakingUtterance in SpeechSynthesisAnnouncer #97