stanford-oval / genie-toolkit

The Genie open source kit for voice assistant (formerly known as Almond)
Apache License 2.0
193 stars 35 forks source link

Support richer output #240

Open gcampax opened 4 years ago

gcampax commented 4 years ago

Our system should allow developers to specify non-textual content in the replies from the agent.

For speech, we should support sound effects, emphasis, prosody. We might want to target Speech Markdown or SSMD, which are simplified syntaxes of SSML. Many of the modifiers are necessary to speak certain types correctly, and we should also support them natively. Or we might want a simpler / more focused syntax, because tokenization might be challenging otherwise.

For GUI output, we should support pictures & RDL (link cards), as we used to do.

This issue is about the infrastructure code to support this and the design of the interface, so that developers can make use of richer interaction in #_[prompt] and #_[result] (and perhaps #_[canonical] if we know how to strip those non-textual markers on the user utterances). A separate issue will be about generating these

gcampax commented 3 years ago

The graphical/interactive outputs were implemented in #406. I'm leaving this open to track speech modifiers, but it's no longer on the critical path.