jhubbardsf / svelte-speech-recognition

Speech recognition library for Svelte
https://svelte-speech-recognition.netlify.app/
32 stars 5 forks source link
speech-recognition speech-to-text svelte svelte-kit

svelte-speech-recognition

npm npm Netlify GitHub last commit

A Svelte library that converts speech from the microphone to text and makes it available to your Svelte components. Originally based off the react-speech-recognition library.

NOTE: This is a WIP and still in alpha. There's still a bit of work to do before v1. That being said it's functional and any testing would be appreciated.

How it works

useSpeechRecognition is a Svelte hook that gives a component access to a transcript of speech picked up from the user's microphone.

SpeechRecognition manages the global state of the Web Speech API, exposing functions to turn the microphone on and off.

Under the hood, it uses Web Speech API. Note that browser support for this API is currently limited, with Chrome having the best experience - see supported browsers for more information.

Useful links

Installation

To install:

npm install --save-dev svelte-speech-recognition

To import in your Svelte code:

import SpeechRecognition, { useSpeechRecognition } from 'svelte-speech-recognition/SpeechRecognition'

Basic example

The most basic example of a component using this hook would be:

<script lang='ts'>
  import SpeechRecognition, { useSpeechRecognition } from 'svelte-speech-recognition/SpeechRecognition';

  const {
      transcriptStore,
      listening,
      resetTranscript,
      browserSupportsSpeechRecognition
  } = useSpeechRecognition();
</script>

This is the final transcript:
{$transcriptStore.finalTranscript}

This is the interim transcript:
{$transcriptStore.interimTranscript}

You can see more examples in the example Svelte app attached to this repo. See Developing.

Why you should use a polyfill with this library

By default, speech recognition is not supported in all browsers, with the best native experience being available on desktop Chrome. To avoid the limitations of native browser speech recognition, it's recommended that you combine svelte-speech-recognition with a speech recognition polyfill. Why? Here's a comparison with and without polyfills:

svelte-speech-recognition currently supports polyfills for the following cloud providers:

Speechly Microsoft Azure Cognitive Services

Cross-browser example

You can find the full guide for setting up a polyfill here. Alternatively, here is a quick (and free) example using Speechly:

{#if (!browserSupportsSpeechRecognition)} Browser doesn't support speech recognition {:else}

Microphone: {listening ? 'on' : 'off'}

{$transcriptStore.finalTranscript}

{/if}


## Detecting browser support for Web Speech API

If you choose not to use a polyfill, this library still fails gracefully on browsers that don't support speech recognition. It is recommended that you render some fallback content if it is not supported by the user's browser:

```sv
{#if (!browserSupportsSpeechRecognition)}
  // Render some fallback content
{/if}

Supported browsers

Without a polyfill, the Web Speech API is largely only supported by Google browsers. As of May 2022, the following browsers support the Web Speech API:

For all other browsers, you can render fallback content using the SpeechRecognition.browserSupportsSpeechRecognition function described above. Alternatively, as mentioned before, you can integrate a polyfill.

Detecting when the user denies access to the microphone

Even if the browser supports the Web Speech API, the user still has to give permission for their microphone to be used before transcription can begin. They are asked for permission when svelte-speech-recognition first tries to start listening. At this point, you can detect when the user denies access via the isMicrophoneAvailable state. When this becomes false, it's advised that you disable voice-driven features and indicate that microphone access is needed for them to work.

{#if (!isMicrophoneAvailable)}
  // Render some fallback content
{/if}

Controlling the microphone

Before consuming the transcript, you should be familiar with SpeechRecognition, which gives you control over the microphone. The state of the microphone is global, so any functions you call on this object will affect all components using useSpeechRecognition.

Turning the microphone on

To start listening to speech, call the startListening function.

SpeechRecognition.startListening()

This is an asynchronous function, so it will need to be awaited if you want to do something after the microphone has been turned on.

Turning the microphone off

To turn the microphone off, but still finish processing any speech in progress, call stopListening.

SpeechRecognition.stopListening()

To turn the microphone off, and cancel the processing of any speech in progress, call abortListening.

SpeechRecognition.abortListening()

Consuming the microphone transcript

To make the microphone transcript available as a Svelte store in your component. It has the interimTranscript and finalTranscript object, simply add:

const { transcriptStore } = useSpeechRecognition()

Resetting the microphone transcript

To set the transcript to an empty string, you can call the resetTranscript function provided by useSpeechRecognition. Note that this is local to your component and does not affect any other components using Speech Recognition.

const { resetTranscript } = useSpeechRecognition()

Commands

To respond when the user says a particular phrase, you can pass in a list of commands to the useSpeechRecognition hook. Each command is an object with the following properties:

Command symbols

To make commands easier to write, the following symbols are supported:

Example with commands

<script lang="ts">
  import SpeechRecognition, { useSpeechRecognition } from 'svelte-speech-recognition/SpeechRecognition';

  let message = '';
  const setMessage = (newMessage: string) => (message = newMessage);

  const commands = [
    {
      command: 'I would like to order *',
      callback: (food: string) => setMessage(`Your order is for: ${food}`),
      matchInterim: true
    },
    {
      command: 'The weather is :condition today',
      callback: (condition: string) => setMessage(`Today, the weather is ${condition}`)
    },
    {
      command: ['Hello', 'Hi'],
      callback: ({ command }: { command: string }) =>
        setMessage(`Hi there! You said: "${command}"`),
      matchInterim: true
    },
    {
      command: 'Beijing',
      callback: (command: string, spokenPhrase: string, similarityRatio: number) =>
        setMessage(`${command} and ${spokenPhrase} are ${similarityRatio * 100}% similar`),
      // If the spokenPhrase is "Benji", the message would be "Beijing and Benji are 40% similar"
      isFuzzyMatch: true,
      fuzzyMatchingThreshold: 0.2
    },
    {
      command: ['eat', 'sleep', 'leave'],
      callback: (command: string) => setMessage(`Best matching command: ${command}`),
      isFuzzyMatch: true,
      fuzzyMatchingThreshold: 0.2,
      bestMatchOnly: true
    },
    {
      command: 'clear',
      callback: ({ resetTranscript }: { resetTranscript: any }) => resetTranscript(),
      matchInterim: true
    }
  ];

  const { transcriptStore, browserSupportsSpeechRecognition } = useSpeechRecognition({ commands });
  const startListening = () => SpeechRecognition.startListening({ continuous: true });
</script>

{#if browserSupportsSpeechRecognition}
  <div>
    <p>{message}</p>
    <p>{$transcriptStore.finalTranscript}</p>
  </div>
{:else}
  <p>Browser does not support speech recognition.</p>
{/if}