results disappear on real IOS device after a short silence

Jankaz2 commented 1 week ago

Hi, first of all thanks for this library, because it works much better than react-native-voice. Congratulations <3

I have a small problem, because when I test it on the IOS emulator, it works perfectly. I can pause while speaking and everything is read correctly. However, on the real device, when I take a pause while speaking, then the results after the pause overwrite the results before the pause.

Here is my config

//...
    const [settings, _] = useState<ExpoSpeechRecognitionOptions>({
        lang: languageToCode(lang! as Language) ?? 'en-NZ',
        interimResults: true,
        maxAlternatives: 3,
        continuous: true,
        requiresOnDeviceRecognition: false,
        addsPunctuation: true,
        androidIntentOptions: {
            EXTRA_SPEECH_INPUT_COMPLETE_SILENCE_LENGTH_MILLIS: 10000,
        },
    });

    useSpeechRecognitionEvent("result", (ev) => {
        const transcript = ev.results[0]?.transcript || "";

        setTranscription((current) => {
            const transcriptTally = ev.isFinal
                ? (current?.transcriptTally ?? "") + transcript
                : (current?.transcriptTally ?? "");

            return {
                transcriptTally,
                transcript: ev.isFinal ? transcriptTally : transcriptTally + transcript,
            };
        });
    });

    useSpeechRecognitionEvent("start", () => {
        setStatus("recognizing");
    });

    useSpeechRecognitionEvent("end", () => {
        setStatus("idle");
    });

    useSpeechRecognitionEvent("error", (ev) => {
        setError(ev);
        errorToast({title: `${t('toasts.somethingWentWrong')}`})
    });

    useSpeechRecognitionEvent("nomatch", (ev) => {
    });

     const startListening = () => {
        if (status !== "idle") return;

        setError(null);
        setStatus('starting');
        ExpoSpeechRecognitionModule.requestPermissionsAsync().then((result) => {
            if (!result.granted) {
                errorToast({title: t('toasts.permissionsNotGranted')})
                return;
            }
            ExpoSpeechRecognitionModule.start(settings);
        });
    };

And the videos presenting the problem

From emulator

https://github.com/user-attachments/assets/c2876b61-6904-4b0b-a66c-5949e8530f5b

The real device

https://github.com/user-attachments/assets/1adcd7a2-2cc4-41d4-927e-0951225d8ba6

Jankaz2 commented 1 week ago

I think it's the API problem instead of my code as I copied the example from this repo and problem is the same. Could you somehow increase the length of possible silence on IOS (I completely do not know swift) or maybe you can suggest some workaround? Would be really grateful

jamsch commented 1 week ago

Hmm that's quite an unusual issue. The continuous mode on iOS doesn't have any special config around speech silence and under the hood we're only just setting some flags (such as requiresOnDeviceRecognition, addsPunctuation, taskHint, etc) for the underlying speech recognizer prior to starting.

I haven't got an iOS device to test this out on right now but will get back to you in the next day or two. If I were to guess though, this may just have something to do with network-based recognition and you may want to enable requiresOnDeviceRecognition on iOS.

Could you provide me the following:

A console.log() of the data from the start/result/end events on the physical device and paste them here
The iOS version are you running on the physical device

Thanks!

Jankaz2 commented 1 week ago

I will send the logs in the next few hours, but for now I think the problem may be related to the IOS version. The first phone where the problem occurs has IOS 18.0 installed. But now I have tested the feature also on a phone with IOS 15 and it works fine. Also, I previously used the react-native-voice library with IOS 17.x.x and it also worked fine. Maybe there are some changes in version 18.0?

jamsch commented 1 week ago

@Jankaz2 Looks like you're right! https://forums.getdrafts.com/t/ios-18-macos-15-beta-warning-for-dictation-users/15334

Apple has introduced a bug in their speech recognition frameworks that renders it impossible to do long-form dictation when running on iOS 18 or macOS 15. This will appear in Drafts as data being discarded as you dictate and only the most recent utterance being retained.

Looks like it's on iOS 18.1 beta too. Here's a screenshot of one of the tickets that resembles your issue:

Jankaz2 commented 1 week ago

Ok, so I guess we just have to wait for bug fixes from Apple. But I have another question, because now I'm testing this on various Android emulators / real device, and the speech recognition stops right after I stop talking for a millisecond, even though I have an additional Android configuration of EXTRA_SPEECH_INPUT_COMPLETE_SILENCE_LENGTH_MILLIS: 10000. Am I missing something?

edit: ok, I think the problem is also with android versions. It does not work on Android 12, but on v14 works good.

jamsch commented 1 week ago

Hey @Jankaz2, unfortunately the only config that's possible for Android 12 and below are the following:

EXTRA_SPEECH_INPUT_COMPLETE_SILENCE_LENGTH_MILLIS
EXTRA_SPEECH_INPUT_MINIMUM_LENGTH_MILLIS
EXTRA_SPEECH_INPUT_POSSIBLY_COMPLETE_SILENCE_LENGTH_MILLIS

I should document this somewhere on the README, but it's quite well known that any kind of "continuous mode" using the Android SpeechRecognizer (at least for Android 12 and below) doesn't really work and online solutions propose just stopping & starting again. For Android 13+ I went a different route thanks to a newer API (EXTRA_AUDIO_SOURCE) which allowed me to hook up a custom audio recorder to the speech recogniser to avoid these limitations. I think this is the only public repo that actually does something like this.

On Android 12 and below, I've configured each of these settings to be set to 600 seconds for continous mode (as we intend that it goes indefinitely), however they don't seem to have any effect at least on Android 12. On an Android 12 (API 31) emulator you'll likely see the logcat message: EXTRA_SPEECH_INPUT_COMPLETE_SILENCE_LENGTH_MILLIS can't be used when EXTRA_PREFER_OFFLINE is false. However this doesn't seem to be applicable for Android 14. I can't even verify whether offline speech recognition works at all for Android 12 either.

So I think the best route you'll have is to just force start the speech recognizer again after it stops as a "hack". Unfortunately this is probably the best you'll get with the current API limitations for Android 12.

jamsch commented 3 days ago

There's a hacky workaround that I'll be exploring to fix the iOS 18 issue that involves checking whether speechDuration is a positive number. It seems like the Apple engineers intended that this is a final result, so I'll be emitting a result event with isFinal: true here. That way, there shouldn't be any need for further changes on your end.

Jankaz2 commented 3 days ago

Cool, let me know when you will release a new version :)

jamsch commented 2 days ago

@Jankaz2 I've just released a new version at expo-speech-recognition@0.2.20. Let me know if that solves the issue for you.

Jankaz2 commented 2 days ago

yes, it works perfectly. thanks a lot :)

jamsch commented 2 days ago

Sweet! Closing this issue. I might open up an issue for the Android <=12 continuous recording issue, but for the time being I've added a note to the README.

jamsch / expo-speech-recognition

results disappear on real IOS device after a short silence #22