How to reset the transcript/results?

kutsan commented 1 month ago

Apologies in advance, this is more of a question than an issue. Is there a way to reset the transcript provided with the result event, or do I need to stop and restart the listener to get a new transcript? I need continuous transcription processing and want to retrieve a new transcript for each process. Additionally, it would be helpful if this library could offer a convenient hook, similar to what is available in react-speech-recognition. An additional resetTranscript functionality could be included with this hook. For example:

const {
  transcript,
  resetTranscript,
  isListening,
  isError,
  error,
} = useSpeechRecognition()

jamsch commented 1 month ago

Hi @kutsan, since this library utilises events very similar to the Web Speech API, you could make a hook that resembles that like the following:

export function useSpeechRecognition() {
  const [transcript, setTranscript] = useState("");
  const [isListening, setIsListening] = useState(false);
  const [error, setError] = useState<string | null>(null);

  useSpeechRecognitionEvent("start", () => setIsListening(true));
  useSpeechRecognitionEvent("end", () => setIsListening(false));
  useSpeechRecognitionEvent("error", (event) => setError(event.error));
  useSpeechRecognitionEvent("result", (event) => {
    setTranscript(event.results[0].transcript);
  });

  return {
    transcript,
    resetTranscript: () => setTranscript(""),
    isListening,
    isError: Boolean(error),
    error,
  };
}

As for resetting the transcript periodically throughout a continuous speech recognition session, there's a few things you may want to keep in mind with the behaviors of the Android and iOS speech recognizers:

For iOS, you'll only receive results with isFinal: false, which will build up a single string until you call .stop() in which a isFinal: true result is processed, or .abort() where you immediately stop.
For Android, you'll also receive partial results but at moments when there's enough silence a final result will be processed. When that happens a new segment will start, meaning any further result transcripts won't return recognized speech from the previous segment.

So what I'd explore is the following:

Either calling .abort() and then .start() immediately afterwards, or
Or storing the "replaced" transcript in state and using transcript.replace(...) to reset the transcript.

The first option doesn't have much downtime, at least for iOS so it may be preferable.

Also as an alternative, this library does actually have compatibility with the Web Speech API, theoretically you could do something like this, as long as the library doesn't use other Web APIs (I'd discourage it though):

import SpeechRecognition, {
  useSpeechRecognition,
} from "react-speech-recognition";
import { ExpoWebSpeechRecognition } from "expo-speech-recognition";

SpeechRecognition.applyPolyfill(ExpoWebSpeechRecognition);

function MyComponent() {
  const { transcript } = useSpeechRecognition();
}

kutsan commented 1 month ago

Thank you for your detailed response! I followed your suggestion and ended up writing a hook. Your tips were also helpful!

As for:

Or storing the "replaced" transcript in state and using transcript.replace(...) to reset the transcript.

That seems like the best approach for now, and I'll likely implement something as you suggested.

Thanks again!

jamsch / expo-speech-recognition

How to reset the transcript/results? #17