React Native plugin for adding voice using Spokestack. This includes speech recognition, wakeword, and natural language understanding, as well as synthesizing text to speech using Spokestack voices.
Using npm:
npm install --save react-native-spokestack
or using yarn:
yarn add react-native-spokestack
Then follow the instructions for each platform to link react-native-spokestack to your project:
Get started using Spokestack, or check out our in-depth tutorials on ASR, NLU, and TTS. Also be sure to take a look at the Cookbook for quick solutions to common problems.
A working example app is included in this repo in the example/
folder.
import Spokestack from 'react-native-spokestack'
import { View, Button, Text } from 'react-native'
function App() {
const [listening, setListening] = useState(false)
const onActivate = () => setListening(true)
const onDeactivate = () => setListening(false)
const onRecognize = ({ transcript }) => console.log(transcript)
useEffect(() => {
Spokestack.addEventListener('activate', onActivate)
Spokestack.addEventListener('deactivate', onDeactivate)
Spokestack.addEventListener('recognize', onRecognize)
Spokestack.initialize(
process.env.SPOKESTACK_CLIENT_ID,
process.env.SPOKESTACK_CLIENT_SECRET
)
// This example starts the Spokestack pipeline immediately,
// but it could be delayed until after onboarding or other
// conditions have been met.
.then(Spokestack.start)
return () => {
Spokestack.removeAllListeners()
}
}, [])
return (
<View>
<Button onClick={() => Spokestack.activate()} title="Listen" />
<Text>{listening ? 'Listening...' : 'Idle'}</Text>
</View>
)
}
To include model files locally in your app (rather than downloading them from a CDN), you also need to add the necessary extensions so
the files can be included by Babel. To do this, edit your metro.config.js
.
const defaults = require('metro-config/src/defaults/defaults')
module.exports = {
resolver: {
assetExts: defaults.assetExts.concat(['tflite', 'txt', 'sjson'])
}
}
Then include model files using source objects:
Spokestack.initialize(clientId, clientSecret, {
wakeword: {
filter: require('./filter.tflite'),
detect: require('./detect.tflite'),
encode: require('./encode.tflite')
},
nlu: {
model: require('./nlu.tflite'),
vocab: require('./vocab.txt'),
// Be sure not to use "json" here.
// We use a different extension (.sjson) so that the file is not
// immediately parsed as json and instead
// passes a require source object to Spokestack.
// The special extension is only necessary for local files.
metadata: require('./metadata.sjson')
}
})
This is not required. Pass remote URLs to the same config options and the files will be downloaded and cached when first calling initialize
.
See the contributing guide to learn how to contribute to the repository and the development workflow.
▸ initialize(clientId
, clientSecret
, config?
): Promise
<void
>
Initialize the speech pipeline; required for all other methods.
The first 2 args are your Spokestack credentials available for free from https://spokestack.io. Avoid hardcoding these in your app. There are several ways to include environment variables in your code.
Using process.env: https://babeljs.io/docs/en/babel-plugin-transform-inline-environment-variables/
Using a local .env file ignored by git: https://github.com/goatandsheep/react-native-dotenv https://github.com/luggit/react-native-config
See SpokestackConfig for all available options.
example
import Spokestack from 'react-native-spokestack'
// ...
await Spokestack.initialize(process.env.CLIENT_ID, process.env.CLIENT_SECRET, {
pipeline: {
profile: Spokestack.PipelineProfile.PTT_NATIVE_ASR
}
})
Name | Type |
---|---|
clientId |
string |
clientSecret |
string |
config? |
[SpokestackConfig](#SpokestackConfig) |
Promise
<void
>
▸ destroy(): Promise
<void
>
Destroys the speech pipeline, removes all listeners, and frees up all resources.
This can be called before re-initializing the pipeline.
A good place to call this is in componentWillUnmount
.
example
componentWillUnmount() {
Spokestack.destroy()
}
Promise
<void
>
▸ start(): Promise
<void
>
Start the speech pipeline.
The speech pipeline starts in the deactivate
state.
example
import Spokestack from 'react-native-spokestack`
// ...
Spokestack.initialize(process.env.CLIENT_ID, process.env.CLIENT_SECRET)
.then(Spokestack.start)
Promise
<void
>
▸ stop(): Promise
<void
>
Stop the speech pipeline. This effectively stops ASR, VAD, and wakeword.
example
import Spokestack from 'react-native-spokestack`
// ...
await Spokestack.stop()
Promise
<void
>
▸ activate(): Promise
<void
>
Manually activate the speech pipeline. This is necessary when using a PTT profile. VAD profiles can also activate ASR without the need to call this method.
example
import Spokestack from 'react-native-spokestack`
// ...
<Button title="Listen" onClick={() => Spokestack.activate()} />
Promise
<void
>
▸ deactivate(): Promise
<void
>
Deactivate the speech pipeline. If the profile includes wakeword, the pipeline will go back to listening for the wakeword. If VAD is active, the pipeline can reactivate without calling activate().
example
import Spokestack from 'react-native-spokestack`
// ...
<Button title="Stop listening" onClick={() => Spokestack.deactivate()} />
Promise
<void
>
▸ synthesize(input
, format?
, voice?
): Promise
<string
>
Synthesize some text into speech
Returns Promise<string>
with the string
being the URL for a playable mpeg.
There is currently only one free voice available ("demo-male"). The voice can be changed if you have created a custom voice using a Spokestack Maker account. See https://spokestack.io/pricing#maker.
example
const url = await Spokestack.synthesize('Hello world')
play(url)
Name | Type |
---|---|
input |
string |
format? |
[TTSFormat](#TTSFormat) |
voice? |
string |
Promise
<string
>
▸ speak(input
, format?
, voice?
): Promise
<void
>
Synthesize some text into speech and then immediately play the audio through the default audio system. Audio session handling can get very complex and we recommend using a RN library focused on audio for anything more than very simple playback.
There is currently only one free voice available ("demo-male").
example
await Spokestack.speak('Hello world')
Name | Type |
---|---|
input |
string |
format? |
[TTSFormat](#TTSFormat) |
voice? |
string |
Promise
<void
>
▸ classify(utterance
): Promise
<SpokestackNLUResult
>
Classify the utterance using the intent/slot Natural Language Understanding model passed to Spokestack.initialize(). See https://www.spokestack.io/docs/concepts/nlu for more info.
example
const result = await Spokestack.classify('hello')
// Here's what the result might look like,
// depending on the NLU model
console.log(result.intent) // launch
Name | Type |
---|---|
utterance |
string |
Promise
<SpokestackNLUResult
>
▸ isInitialized(): Promise
<boolean
>
Returns whether Spokestack has been initialized
example
console.log(`isInitialized: ${await Spokestack.isInitialized()}`)
Promise
<boolean
>
▸ isStarted(): Promise
<boolean
>
Returns whether the speech pipeline has been started
example
console.log(`isStarted: ${await Spokestack.isStarted()}`)
Promise
<boolean
>
▸ isActivated(): Promise
<boolean
>
Returns whether the speech pipeline is currently activated
example
console.log(`isActivated: ${await Spokestack.isActivated()}`)
Promise
<boolean
>
• confidence: number
A number from 0 to 1 representing the NLU model's confidence in the intent it recognized, where 1 represents absolute confidence.
• intent: string
The intent based on the match provided by the NLU model
• slots: SpokestackNLUSlots
Data associated with the intent, provided by the NLU model
▪ [key: string
]: SpokestackNLUSlot
• rawValue: string
The original string value of the slot recognized in the user utterance
• type: string
The slot's type, as defined in the model metadata
• value: any
The parsed (typed) value of the slot recognized in the user utterance
▸ addEventListener(eventType
, listener
, context?
): EmitterSubscription
Bind to any event emitted by the native libraries See Events for a list of all available events.
example
useEffect(() => {
const listener = Spokestack.addEventListener('recognize', onRecognize)
// Unsubsribe by calling remove when components are unmounted
return () => {
listener.remove()
}
}, [])
Name | Type | Description |
---|---|---|
eventType |
string |
name of the event for which we are registering listener |
listener |
(event : any ) => void |
the listener function |
context? |
Object |
context of the listener |
EmitterSubscription
▸ removeEventListener(eventType
, listener
): void
Remove an event listener
example
Spokestack.removeEventListener('recognize', onRecognize)
Name | Type | Description |
---|---|---|
eventType |
string |
Name of the event to emit |
listener |
(...args : any []) => any |
Function to invoke when the specified event is emitted |
void
▸ removeAllListeners(): void
Remove any existing listeners
example
Spokestack.removeAllListeners()
void
Three formats are supported when using Spokestack TTS. Raw text, SSML, and Speech Markdown. See https://www.speechmarkdown.org/ if unfamiliar with Speech Markdown. IPA is expected when using SSML or Speech Markdown.
• SPEECHMARKDOWN = 2
• SSML = 1
• TEXT = 0
Use addEventListener()
, removeEventListener()
, and removeAllListeners()
to add and remove events handlers. All events are available in both iOS and Android.
Name | Data | Description |
---|---|---|
recognize | { transcript: string } |
Fired whenever speech recognition completes successfully. |
partial_recognize | { transcript: string } |
Fired whenever the transcript changes during speech recognition. |
start | null |
Fired when the speech pipeline starts (which begins listening for wakeword or starts VAD). |
stop | null |
Fired when the speech pipeline stops. |
activate | null |
Fired when the speech pipeline activates, either through the VAD, wakeword, or when calling .activate() . |
deactivate | null |
Fired when the speech pipeline deactivates. |
play | { playing: boolean } |
Fired when TTS playback starts and stops. See the speak() function. |
timeout | null |
Fired when an active pipeline times out due to lack of recognition. |
trace | { message: string } |
Fired for trace messages. Verbosity is determined by the traceLevel option. |
error | { error: string } |
Fired when there's an error in Spokestack. |
When an error event is triggered, any existing promises are rejected.
These are the configuration options that can be passed to Spokestack.initialize(_, _, spokestackConfig)
. No options in SpokestackConfig are required.
SpokestackConfig has the following structure:
interface SpokestackConfig {
/**
* This option is only used when remote URLs are passed to fields such as `wakeword.filter`.
*
* Set this to true to allow downloading models over cellular.
* Note that `Spokestack.initialize()` will still reject the promise if
* models need to be downloaded but there is no network at all.
*
* Ideally, the app will include network handling itself and
* inform the user about file downloads.
*
* Default: false
*/
allowCellularDownloads?: boolean
/**
* Wakeword, Keyword, and NLU model files are cached internally.
* Set this to true whenever a model is changed
* during development to refresh the internal model cache.
*
* This affects models passed with `require()` as well
* as models downloaded from remote URLs.
*
* Default: true in dev mode, false otherwise
*
* **Important:** By default, apps in production will
* cache models to avoid downloading them every time
* the app is launched. The side-effect of this optimization
* is that if models change on the CDN, apps will
* not pick up those changes–unless the app were reinstalled.
* We think this is a fair trade-off, but set this to `true`
* if you prefer to download the models every time the app
* is launched.
*/
refreshModels?: boolean
/**
* This controls the log level for the underlying native
* iOS and Android libraries.
* Also add a `"trace"` event listener to get trace events.
* See the TraceLevel enum for values.
*/
traceLevel?: TraceLevel
/**
* Most of these options are advanced aside from "profile"
*/
pipeline?: PipelineConfig
/** Only needed if using Spokestack.classify */
nlu?: NLUConfig
/**
* Only required for wakeword
* Most options are advanced aside from
* filter, encode, and decode for specifying config files.
*/
wakeword?: WakewordConfig
/**
* Only required for the keyword recognizer
* Most options are advanced aside from
* filter, encode, decode, metadata, and classes.
*/
keyword?: KeywordConfig
}
How much logging to show A lower number means more logs.
• DEBUG = 10
• INFO = 30
• NONE = 100
• PERF = 20
Pipeline profiles set up the speech pipeline based on your needs
• PTT_NATIVE_ASR = 2
Apple/Android Automatic Speech Recogntion is on when the speech pipeline is active. This is likely the more common profile when not using wakeword.
• PTT_SPOKESTACK_ASR = 5
Spokestack Automatic Speech Recogntion is on when the speech pipeline is active. This is likely the more common profile when not using wakeword, but Spokestack ASR is preferred.
• TFLITE_WAKEWORD_KEYWORD = 6
VAD-sensitive TFLite wake word activates TFLite keyword recognizer
• TFLITE_WAKEWORD_NATIVE_ASR = 0
Set up wakeword and use local Apple/Android ASR. Note that wakeword.filter, wakeword.encode, and wakeword.detect are required if any wakeword profile is used.
• TFLITE_WAKEWORD_SPOKESTACK_ASR = 3
Set up wakeword and use remote Spokestack ASR. Note that wakeword.filter, wakeword.encode, and wakeword.detect are required if any wakeword profile is used.
• VAD_KEYWORD_ASR = 7
VAD-triggered TFLite Keyword Recognizer
• VAD_NATIVE_ASR = 1
Apple/Android Automatic Speech Recognition is on when Voice Active Detection triggers it.
• VAD_SPOKESTACK_ASR = 4
Spokestack Automatic Speech Recognition is on when Voice Active Detection triggers it.
• Optional
agcCompressionGainDb: number
advanced
Android-only for AcousticGainControl
Target peak audio level, in -dB, to maintain a peak of -9dB, configure a value of 9
• Optional
agcTargetLevelDbfs: number
advanced
Android-only for AcousticGainControl
Dynamic range compression rate, in dBFS
• Optional
ansPolicy: "aggressive"
| "very-aggressive"
| "mild"
| "medium"
advanced
Android-only for AcousticNoiseSuppressor
Noise policy
• Optional
bufferWidth: number
advanced
Buffer width, used with frameWidth to determine the buffer size
• Optional
frameWidth: number
advanced
Speech frame width, in ms
• Optional
profile: PipelineProfile
Profiles are collections of common configurations for Pipeline stages.
If no profile is set explicitly, Spokestack determines
a sensible default profile based on the config
passed to Spokestack.initialize()
:
If wakeword config files are set (and keyword config is not),
the default will be set to TFLITE_WAKEWORD_NATIVE_ASR
.
If keyword config files are set (and wakeword config is not),
the default will be set to VAD_KEYWORD_ASR
.
If both wakeword and keyword config files are set,
the default will be set to TFLITE_WAKEWORD_KEYWORD
.
Otherwise, the default is PTT_NATIVE_ASR
.
• Optional
sampleRate: number
Audio sampling rate, in Hz
• Optional
vadFallDelay: number
advanced
Falling-edge detection run length, in ms; this value determines how many negative samples must be received to flip the detector to negative
• Optional
vadMode: "quality"
| "low-bitrate"
| "aggressive"
| "very-aggressive"
Voice activity detector mode
• Optional
vadRiseDelay: number
advanced
Android-only
Rising-edge detection run length, in ms; this value determines how many positive samples must be received to flip the detector to positive
• metadata: string
| number
The JSON file for NLU metadata. If specified, model and vocab are also required.
This field accepts 2 types of values.
require
or import
(e.g. metadata: require('./metadata.sjson')
).IMPORTANT: a special extension is used for local metadata JSON files (.sjson
) when using require
or import
so the file is not parsed when included but instead imported as a source object. This makes it so the
file is read and parsed by the underlying native libraries instead.
• model: string
| number
The NLU Tensorflow-Lite model. If specified, metadata and vocab are also required.
This field accepts 2 types of values.
require
or import
(e.g. model: require('./nlu.tflite')
)• vocab: string
| number
A txt file containing the NLU vocabulary. If specified, model and metadata are also required.
This field accepts 2 types of values.
require
or import
(e.g. vocab: require('./vocab.txt')
)• Optional
inputLength: number
• detect: string
| number
The "detect" Tensorflow-Lite model. If specified, filter and encode are also required.
This field accepts 2 types of values.
require
or import
(e.g. detect: require('./detect.tflite')
)The encode model is used to perform each autoregressive step over the mel frames; its inputs should be shaped [mel-length, mel-width], and its outputs [encode-width], with an additional state input/output shaped [state-width]
• encode: string
| number
The "encode" Tensorflow-Lite model. If specified, filter and detect are also required.
This field accepts 2 types of values.
require
or import
(e.g. encode: require('./encode.tflite')
)Its inputs should be shaped [encode-length, encode-width], and its outputs
• filter: string
| number
The "filter" Tensorflow-Lite model. If specified, detect and encode are also required.
This field accepts 2 types of values.
require
or import
(e.g. filter: require('./filter.tflite')
)The filter model is used to calculate a mel spectrogram frame from the linear STFT; its inputs should be shaped [fft-width], and its outputs [mel-width]
• Optional
activeMax: number
advanced
The maximum length of an activation, in milliseconds, used to time out the activation
• Optional
activeMin: number
advanced
The minimum length of an activation, in milliseconds, used to ignore a VAD deactivation after the wakeword
• Optional
requestTimeout: number
iOS-only
Length of time to allow an Apple ASR request to run, in milliseconds. Apple has an undocumented limit of 60000ms per request.
• Optional
rmsAlpha: number
advanced
Android-only
The Exponentially-Weighted Moving Average (EWMA) update rate for the current RMS signal energy (0 for no RMS normalization)
• Optional
rmsTarget: number
advanced
Android-only
The desired linear Root Mean Squared (RMS) signal energy, which is used for signal normalization and should be tuned to the RMS target used during training
• Optional
wakewords: string
| string
[]
iOS-only
An ordered array or comma-separated list of wakeword keywords Only necessary when not passing the filter, detect, and encode paths.
• detect: string
| number
The "detect" Tensorflow-Lite model. If specified, filter and encode are also required.
This field accepts 2 types of values.
require
or import
(e.g. detect: require('./detect.tflite')
)The encode model is used to perform each autoregressive step over the mel frames; its inputs should be shaped [mel-length, mel-width], and its outputs [encode-width], with an additional state input/output shaped [state-width]
• encode: string
| number
The "encode" Tensorflow-Lite model. If specified, filter and detect are also required.
This field accepts 2 types of values.
require
or import
(e.g. encode: require('./encode.tflite')
)Its inputs should be shaped [encode-length, encode-width], and its outputs
• filter: string
| number
The "filter" Tensorflow-Lite model. If specified, detect and encode are also required.
This field accepts 2 types of values.
require
or import
(e.g. filter: require('./filter.tflite')
)The filter model is used to calculate a mel spectrogram frame from the linear STFT; its inputs should be shaped [fft-width], and its outputs [mel-width]
Either metadata
or classes
is required, and they are mutually exclusive.
• metadata: string
| number
The JSON file for Keyword metadata.
Required if keyword.classes
is not specified.
This field accepts 2 types of values.
require
or import
(e.g. metadata: require('./metadata.sjson')
).IMPORTANT: a special extension is used for local metadata JSON files (.sjson
) when using require
or import
so the file is not parsed when included but instead imported as a source object. This makes it so the
file is read and parsed by the underlying native libraries instead.
• classes: string
| string
[]
A comma-separated list or an ordered array of class names for the keywords.
The name corresponding to the most likely class will be returned
in the transcript field when the recognition event is raised.
Required if keyword.metadata
is not specified.
These properties can be passed to either the wakeword
or keyword
config object, but are not shared.
• Optional
encodeLength: number
advanced
The length of the sliding window of encoder output used as an input to the classifier, in milliseconds
• Optional
encodeWidth: number
advanced
The size of the encoder output, in vector units
• Optional
fftHopLength: number
advanced
The length of time to skip each time the overlapping STFT is calculated, in milliseconds
• Optional
fftWindowSize: number
advanced
The size of the signal window used to calculate the STFT, in number of samples - should be a power of 2 for maximum efficiency
• Optional
fftWindowType: string
advanced
Android-only
The name of the windowing function to apply to each audio frame before calculating the STFT; currently the "hann" window is supported
• Optional
melFrameLength: number
advanced
The length of time to skip each time the overlapping STFT is calculated, in milliseconds
• Optional
melFrameWidth: number
advanced
The size of each mel spectrogram frame, in number of filterbank components
• Optional
preEmphasis: number
advanced
The pre-emphasis filter weight to apply to the normalized audio signal (0 for no pre-emphasis)
• Optional
stateWidth: number
advanced
The size of the encoder state, in vector units (defaults to wake-encode-width)
• Optional
threshold: number
advanced
The threshold of the classifier's posterior output, above which the trigger activates the pipeline, in the range [0, 1]
Apache-2.0
Copyright 2021 Spokestack