ifrotz / iosfrotz

Frotz for iOS source (formerly at code.google.com/p/iphonefrotz)
Other
49 stars 17 forks source link

Voice-only operation with SFSpeechRecognizer and AVSpeechSynthesizer #287

Open glangmead opened 6 years ago

glangmead commented 6 years ago

This project sparked my interest in enhancing iOS Frotz: https://www.amazon.com/Vitaly-Lishchenko-Interactive-Fiction/dp/B01IVEANGM

I bet there are hidden challenges but the technical ones may be addressed:

  1. SFSpeechRecognitionRequest (recently added in iOS 10) permits the inclusion of a set of "context" strings, which might make it more accurate at recognizing any specialized vocabulary from the game (https://developer.apple.com/documentation/speech/sfspeechrecognitionrequest)

  2. AVSpeechSynthesizer permits control over the voice synthesis.

Possible hidden challenges:

  1. Handling requests from the user to repeat something. Must we reread the whole machine reply or can we break it up into sections or paragraphs or sentences and let the user ask for something specific?

  2. Giving good audio and visual feedback about the state of the interpreter and its voice recognition. Is it listening? Is it processing? What actions are available right now?

  3. App navigation like saving and quitting, opening a different game, whatever.

glangmead commented 6 years ago

Some of the above work could also serve to implement a SiriKit extension inside Frotz, which in turn can allow the HomePod to participate: https://developer.apple.com/sirikit/

spathiwa commented 6 years ago

That's pretty cool. I love the idea of controlling Frotz with a HomePod. Hopefully it can be much less cumbersome than "Alexa, ask Interactive Fiction to...".

ncalexan commented 6 years ago

Has anybody made progress in this direction? I'm not personally interested in speech output, but I am interested in speech recognition for input. I wonder if the Apple APIs make it possible to implement a keyword (like "Hey Siri" or "Hey Google") that is specific to the current App, which could start the recognition listening for one input. That might make for the smallest possible technical demo.