mapbox / mapbox-navigation-ios

Turn-by-turn navigation logic and UI in Swift on iOS
https://docs.mapbox.com/ios/navigation/
Other
858 stars 308 forks source link

Feature request: Allow apps to reliably synchronize speech synthesis with the SDK #1598

Open macdrevx opened 6 years ago

macdrevx commented 6 years ago

Mapbox Navigation SDK version: 0.18.1

In our app, we've written the code for the map screen such that the audio cues, visual cues, and starting/stopping navigation are driven independently. This has worked well for us because it allows us to keep the complexity of the screen in check.

It does make it challenging to reliably synchronize our supplemental audio cues with those generated by the SDK. For example: when we're about to start navigation, we're currently checking to see whether our speech synthesizer is speaking before presenting the NavigationViewController, and if it is, we wait until it is finished. This approach works most of the time but not always becasue we don't have a guarantee that the audio announcement won't be triggered after the NavigationViewController has been shown. I've seen this happen and it results in two different audio announcements talking over each other.

One approach we've considered is to use the VoiceControllerDelegate to add our additional audio cue to the ones from the SDK. The problem is that there's also no guarantee that our additional audio cue will be triggered before those delegate methods are called. If it is called after the delegate method, it will be too late for us to customize the SDK's message.

To solve this reliably, we'd need to combine our audio cue logic with our navigation logic, which would increase the complexity of our codebase significantly. It's not that it's not possible; it's that it's forcing us down an architectural path that is inconsistent with the rest of our code and which is unfamiliar to the developers on our team.

This seems especially undesirable since instances of AVSpeechSynthesizer already do the exact thing we want, namely queuing up utterances so that they aren't spoken over each other. Given that, I imagine a couple possible solutions:

  1. Allow developers to inject an instance of AVSpeechSynthesizer into the SDK and update the SDK to avoid doing things like calling speechSynth.stopSpeaking(at: .immediate) in deinit.

  2. Update the SDK to optionally delegate speaking tasks to the app. Then apps that want to could manage utterances generated by the SDK themselves. This seems like a more complicated change to me.

1ec5 commented 6 years ago

Synchronizing speech synthesis is a cool idea! Note that this SDK defaults to the Mapbox Voice API (powered by Amazon Polly), falling back to AVSpeechSynthesizer when Polly is inaccessible or incapable of producing the required speech. I take it you’re mainly interested in synchronizing to the beginning and end of utterances, not to individual words within an utterance, correct? (Polly does have some support for synchronizing text based on the <mark> SSML tag, but we currently treat the SSML that comes from the Directions API as a black box.)

/cc @JThramer @bsudekum

macdrevx commented 6 years ago

@1ec5 Thanks for the clarification on Polly vs AVSpeechSynthesizer. You're right that our goal is merely to ensure that our AVSpeechSynthesizer doesn't talk over the SDK's speech synthesis (or vice-versa).