react-native-voice / voice

:microphone: React Native Voice Recognition library for iOS and Android (Online and Offline Support)
MIT License
1.8k stars 485 forks source link

After a small pause(silence), react native voice sends a beep and stops recording speech on android. It works fine on IOS though. PLEASE FOR HELP ! #441

Open amefire opened 1 year ago

amefire commented 1 year ago

After a small pause(silence), react native voice sends a beep and stops recording speech on ANDROID. It works fine on IOS though. PLEASE FOR HELP !

vinesh4Real commented 1 year ago

We are also facing this issue. Can someone help with this?

joorjeh commented 1 year ago

Also facing this issue.

ofirgeller commented 1 year ago

Looks like 'onSpeechEnd' happens but there is still a small gap where the engine is listening, then the beep comes. Maybe this is a change to the google API to give users warning before the vtt cuts off?

amefire commented 1 year ago

Looks like 'onSpeechEnd' happens but there is still a small gap where the engine is listening, then the beep comes. Maybe this is a change to the google API to give users warning before the vtt cuts off?

Right after the beep, the recording stops working. There's no gap ofirgeller or I haven't noticed one.

Thanks.

ofirgeller commented 1 year ago

I meant it might go like this: onSpeechEnd event gap end of recording + beep

amefire commented 1 year ago

I meant it might go like this: onSpeechEnd event gap end of recording + beep

Okay sir, any suggestion on this ?

ofirgeller commented 1 year ago

Personally, I'm migrating to a different service. had enough of the google vtt API breaking without warning. If you can't do that, dive into the android code (java) following the latest documentation and see if you can maybe set some params to prevent this :(

On Tue, Jun 27, 2023 at 5:36 PM Abdallah Mefire @.***> wrote:

I meant it might go like this: onSpeechEnd event gap end of recording + beep

Okay sir, any suggestion on this ?

— Reply to this email directly, view it on GitHub https://github.com/react-native-voice/voice/issues/441#issuecomment-1609639683, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABASYH5F75FBLWEL5EKL47TXNLVYHANCNFSM6AAAAAAZDFFYQI . You are receiving this because you commented.Message ID: @.***>

amefire commented 1 year ago

Personally, I'm migrating to a different service. had enough of the google vtt API breaking without warning. If you can't do that, dive into the android code (java) following the latest documentation and see if you can maybe set some params to prevent this :( On Tue, Jun 27, 2023 at 5:36 PM Abdallah Mefire @.> wrote: I meant it might go like this: onSpeechEnd event gap end of recording + beep Okay sir, any suggestion on this ? — Reply to this email directly, view it on GitHub <#441 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABASYH5F75FBLWEL5EKL47TXNLVYHANCNFSM6AAAAAAZDFFYQI . You are receiving this because you commented.Message ID: @.>

Okay, please which one are you planning on using ? What could be an alternative to this service ?

ofirgeller commented 1 year ago

too early for me to endorse a service since I'm mid migration but current plan is:

use react-native-player-recorder to capture the audio use whisper ai SAAS api to recognize.

advantage: the models are open source, so if cost gets out of hand we can self host.

disadvantage: no streaming, so no intermediate results, and slower results if the audio is long enough so the file size matters. other services allow customization of the model (beyond language), whisper does not.

rexFX commented 1 year ago

I think I have solved this, may need more testing though (edit: tested multiple times, it works)

after installing the module, go to node_modules/@react-native-voice/voice/

now look for the code below in the following two files:

  1. index.ts under src folder
  2. index.js under dist folder
Voice.startSpeech(
          locale,
          Object.assign(
            {
              EXTRA_LANGUAGE_MODEL: 'LANGUAGE_MODEL_FREE_FORM',
              EXTRA_MAX_RESULTS: 5,
              EXTRA_PARTIAL_RESULTS: true,
              REQUEST_PERMISSIONS_AUTO: true,
            },
            options,
          ),
          callback,
        );

add these lines there:

EXTRA_SPEECH_INPUT_MINIMUM_LENGTH_MILLIS: 60000, // minimum how long it should listen in milliseconds
EXTRA_SPEECH_INPUT_POSSIBLY_COMPLETE_SILENCE_LENGTH_MILLIS: 8000, // how long it should wait after silence in milliseconds

edit: instead of the above two lines, this one also works: EXTRA_SPEECH_INPUT_COMPLETE_SILENCE_LENGTH_MILLIS: 10000,

it should look like this now:

Voice.startSpeech(
          locale,
          Object.assign(
            {
              EXTRA_LANGUAGE_MODEL: 'LANGUAGE_MODEL_FREE_FORM',
              EXTRA_MAX_RESULTS: 5,
              EXTRA_PARTIAL_RESULTS: true,
              REQUEST_PERMISSIONS_AUTO: true,
              EXTRA_SPEECH_INPUT_MINIMUM_LENGTH_MILLIS: 60000, // minimum how long it should listen
              EXTRA_SPEECH_INPUT_POSSIBLY_COMPLETE_SILENCE_LENGTH_MILLIS: 8000, // how long it should wait after silence
            },
            options,
          ),
          callback,
        );

after adding them, save both files and then run this command

npm start -- --reset-cache

it should work. worked for me, it kept listening for longer than 40 seconds which i tested

you can find additional options here: https://developer.android.com/reference/android/speech/RecognizerIntent

you can check how the options are being set in the file below:

ofirgeller commented 1 year ago

Passing these params (or any others) from user code or adding them directly where you did crashes the app for me.

rexFX commented 1 year ago

For me the app only crashed when I pressed stop or when I kept it listening for like around a minute or so

Using Voice.cancel() instead of Voice.stop() seems to fix the issue, the downside is that it won't give you results after you stop listening

edit: it is still crashing after silence phase :\

rexFX commented 1 year ago

@ofirgeller hey can you go to this file: node_modules/@react-native-voice/voice/android/src/main/java/com/wenkesj/voice/VoiceModule.java

and then go to line 352 which is this one:

for (String result : matches) {
        arr.pushString(result);
}

and wrap it like this:

if (matches != null) {
      for (String result : matches) {
        arr.pushString(result);
      }
}

save the file and do npm start -- --reset-cache

does this help?

ofirgeller commented 1 year ago

@rexFX That does prevent the crash, so... passing in the params causes an empty (null) matches results which otherwise does not happen?

also, EXTRA_SPEECH_INPUT_COMPLETE_SILENCE_LENGTH_MILLIS seems to still not be respected, are you able to make it work?

EXTRA_SPEECH_INPUT_MINIMUM_LENGTH_MILLIS does seem to be respected.

rexFX commented 1 year ago

@ofirgeller You may use partial results, that produces the output but it gets cleared quickly. I found that whenever there is a gap, it ends listening and then starts listening again with an empty result (null).

It produces result only after EXTRA_SPEECH_INPUT_COMPLETE_SILENCE_LENGTH_MILLIS value, by then result becomes empty (null).

I haven't tested on EXTRA_SPEECH_INPUT_MINIMUM_LENGTH_MILLIS. Since we are not setting 'silence time' for this case, as soon as a gap is detected after minimum length, it produces an output thats why I think it is being respected here.


Ignore what I said below, doesnt work..

I tried various things to save the output for the silence one but nothing worked, ended up making my own logic. You can try:

in the VoiceModule.java file, add these variables:

// we will store the partial results in this array and keep track of current index using recordingSize.

private ArrayList<String> lastRecorded = new ArrayList<String>();
private int recordingSize = -1;

after that look for this method public void onBeginningOfSpeech() and add these two lines:

//whenever speech is detected after a gap, increment recordingSize and push new string so that we can edit it later.

recordingSize++;
lastRecorded.add("");

after that look for this method public void onPartialResults(Bundle results) and add this lines just above WritableMap event = Arguments.createMap();

// always store the latest partial result in the current index

if (matches != null && matches.size() > 0 && matches.get(0) != "") lastRecorded.set(recordingSize, matches.get(0));

after that go to this method public void onResults(Bundle results) and add an else if statement there, it would look like this:

// if result is null then we can use our stored partial result, after using it we clear it.

    if (matches != null) {
      for (String result : matches) {
        arr.pushString(result);
      }
    }
    else if (recordingSize != -1) {
      for (String result: lastRecorded) {
        arr.pushString(result);
      }
    }

    lastRecorded.clear();
    recordingSize = -1;

save and do npm start -- --reset-cache

now you should see the outputs in the result, if I find a better way then I will update here.

ofirgeller commented 1 year ago

There is no point in saving the result of silence, since it is null from a practical POV, but maybe I misunderstand what your latest code is trying to do.

I was talking about the VTT stopping when the user is not speaking for 1 second, even when COMPLETE_SILENCE_LENGTH_MILLIS is set to 5 seconds and I would expect it to not stop (until 5 seconds of silence).

this is a long standing API issue AFAIK

rexFX commented 1 year ago

In the VoiceModule.java:

When user speaks, onBeginningOfSpeech gets called. When user stops speaking, onEndOfSpeech gets called and just after that onResults is called.

When we set complete silence length, it will still call onEndOfSpeech instantly when it thinks the user has stopped speaking. It doesn't care about silence length, but onResults is not called until silence length is over and it also becomes NULL for some reason.

I think this silence length option is there to prevent stoppage of the recognition process completely, i.e. if user speaks anything after a short gap then it will treat it as a new session and start listening from scratch without needing the user to do anything otherwise stop the whole process.

devlprkhan commented 10 months ago

any update? :(

GuilhermeMReis commented 10 months ago

Still happening 😢 😭

devlprkhan commented 10 months ago

@GuilhermeMReis or any buddy here's how i fix this issue bellow is the step by step guide:

### To extend the delay for complete silence in speech recognition using the react-native-voice package, follow these steps:

  1. On the JS Side (React Native):
  1. On the Android Side:
  1. Patching Process:

### Step-by-Step Guide:

1. On the JS Side (React Native)

const options = { EXTRA_SPEECH_INPUT_COMPLETE_SILENCE_LENGTH_MILLIS: 10000, // Adjust the value as needed };

  1. Use Options in your startSpeech Function like:

const startSpeech = async () => { // ... Other Code const options = { EXTRA_SPEECH_INPUT_COMPLETE_SILENCE_LENGTH_MILLIS: 10000, // Adjust the value as needed }; try { await Voice.start('en-US', options); setSpeaking(true); //... Other Code } catch (e) { setSpeaking(false); console.log(e); } };

2. On the Android Side:

  1. Locate VoiceModule.java: Navigate to node_modules/@react-native-voice/voice/android/src/main/java/com/wenkesj/voice/. Open the VoiceModule.java file.

  2. Replace onResults Function with bellow function:

    public void onResults(Bundle results) { ArrayList<String> matches = results.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION); if (matches != null && !matches.isEmpty()) { WritableArray arr = Arguments.createArray(); for (String result : matches) { arr.pushString(result); } WritableMap event = Arguments.createMap(); event.putArray("value", arr); sendEvent("onSpeechResults", event); } else { // Handle the case where results are null or empty // You can send an event or take appropriate action WritableMap event = Arguments.createMap(); event.putBoolean("error", true); sendEvent("onSpeechResults", event); } }

3. Patching Process

Save the file rebuilt your "React Native" Project and Patch The Package 📦 (If you Don’t t Know How to Do It Go Search Online).

ahsanbhatti98 commented 7 months ago

Guys you can use its alternative https://github.com/sunboykenneth/react-native-voicebox-speech-rec it is far better than this . No speech pause or and other issue in this library i also shifted to this one.

codding123vbf commented 6 months ago

did someone fix this issue yet ?

codding123vbf commented 6 months ago

Guys you can use its alternative https://github.com/sunboykenneth/react-native-voicebox-speech-rec it is far better than this . No speech pause or and other issue in this library i also shifted to this one.

bro this link is redirecting me to the same library

codding123vbf commented 6 months ago

i fixed it by calling the start function again and again with setinterval of 5 seconds

ahsanbhatti98 commented 5 months ago

Guys you can use its alternative https://github.com/sunboykenneth/react-native-voicebox-speech-rec it is far better than this . No speech pause or and other issue in this library i also shifted to this one.

bro this link is redirecting me to the same library

https://github.com/sunboykenneth/react-native-voicebox-speech-rec don’t know why it happens here is the new one

ahsanbhatti98 commented 5 months ago

i fixed it by calling the start function again and again with setinterval of 5 seconds

But it will start speech from scratch ? it will not do continuous speech right ? or you solved it?

nara-falconeer commented 5 months ago

This is what I'm doing and I think I'm happy with the results.

  1. Send EXTRA_SPEECH_INPUT_COMPLETE_SILENCE_LENGTH_MILLIS and EXTRA_SPEECH_INPUT_POSSIBLY_COMPLETE_SILENCE_LENGTH_MILLIS to 2000, so the silence detected is a bit longer

    await Voice.start('en-US', {
    EXTRA_SPEECH_INPUT_COMPLETE_SILENCE_LENGTH_MILLIS: 2000,
    EXTRA_SPEECH_INPUT_POSSIBLY_COMPLETE_SILENCE_LENGTH_MILLIS: 2000,
    });
  2. Edit the VoiceModule.java's onResults thus:

    public void onResults(Bundle results) {
    WritableArray arr = Arguments.createArray();
    
    ArrayList<String> matches = results.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION);
    if (matches != null) { // this is the change that makes sure the crash doesn't happen
      for (String result : matches) {
        arr.pushString(result);
      }
    }
    WritableMap event = Arguments.createMap();
    event.putArray("value", arr);
    sendEvent("onSpeechResults", event);
    Log.d("ASR", "onResults()");
    }
    1. One more issue I found is the sequence of callbacks. After the silence is detected, first onSpeechEnd is called, then one more onSpeechPartialResults that has the final text value, and finally onSpeechResults is called with an empty string. To deal with this sequence, I'm relying on onSpeechResults to indicate end of speech (instead of onSpeechEnd) and am relying on the last text value returned by onSpeechPartialResults. If you're using a state variable to store the text in onSpeechPartialResults, remember that you can't just use the state variable in onSpeechResults as the closure would have captured the value (likely an empty string) when you created the closure. You might have to do a useref like this:
      const inputTextRef = useRef('');

      and set the ref in onSpeechPartialResults like this:

      inputTextRef.current = text;

      NOT QUITE. KEEP READING!!

  3. Another minor thing I had to work around. If there are short pauses that would have been recognized as end of speech before increasing the silence timeout, this restarts onSpeechPartialResults. So, you can't rely on onSpeechPartialResults being the complete string that captures all speech until now. You have to deal with this with another workaround in onSpeechPartialResults. Create a couple of more useRef strings - previousTextToPrependRef and lastResultRef. Then, in onSpeechPartialResults, do the following:

    if (e.value && !e.value[0].startsWith(lastResultRef.current)) {
        previousTextToPrependRef.current = inputTextRef.current;
    }
    if (e.value && e.value.length > 0) {
        inputTextRef.current = previousTextToPrependRef.current + e.value[0]; 
        lastResultRef.current = e.value[0];
     }

    So far so good. I'll modify this post if my results change. (I've edited 4 times already - hopefully for the last time!)

Asharuddin-90 commented 5 months ago

I think this library is better because there is no pause nothing. (https://github.com/sunboykenneth/react-native-voicebox-speech-rec)

lutfi-haslab commented 2 months ago

I think this library is better because there is no pause nothing. (https://github.com/sunboykenneth/react-native-voicebox-speech-rec)

No, i am using this library and it breaks my build, i think, it need to build first.

ahsanbhatti98 commented 2 months ago

I think this library is better because there is no pause nothing. (https://github.com/sunboykenneth/react-native-voicebox-speech-rec)

Yes i also shifted on it too . It is better from this one.

ObscurusGrassator commented 2 months ago

I think this library is better because there is no pause nothing. (https://github.com/sunboykenneth/react-native-voicebox-speech-rec)

This library have little options - only basic. And it have bugs that no one is fixing long term.

ObscurusGrassator commented 2 months ago

i fixed it by calling the start function again and again with setinterval of 5 seconds

Bad resolving. Between two starts there is a short listening pause and loss of information.

ledezmarcos commented 2 months ago

This is what I'm doing and I think I'm happy with the results.

  1. Send EXTRA_SPEECH_INPUT_COMPLETE_SILENCE_LENGTH_MILLIS and EXTRA_SPEECH_INPUT_POSSIBLY_COMPLETE_SILENCE_LENGTH_MILLIS to 2000, so the silence detected is a bit longer
await Voice.start('en-US', {
    EXTRA_SPEECH_INPUT_COMPLETE_SILENCE_LENGTH_MILLIS: 2000,
    EXTRA_SPEECH_INPUT_POSSIBLY_COMPLETE_SILENCE_LENGTH_MILLIS: 2000,
});
  1. Edit the VoiceModule.java's onResults thus:
  public void onResults(Bundle results) {
    WritableArray arr = Arguments.createArray();

    ArrayList<String> matches = results.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION);
    if (matches != null) { // this is the change that makes sure the crash doesn't happen
      for (String result : matches) {
        arr.pushString(result);
      }
    }
    WritableMap event = Arguments.createMap();
    event.putArray("value", arr);
    sendEvent("onSpeechResults", event);
    Log.d("ASR", "onResults()");
  }
  1. One more issue I found is the sequence of callbacks. After the silence is detected, first onSpeechEnd is called, then one more onSpeechPartialResults that has the final text value, and finally onSpeechResults is called with an empty string. To deal with this sequence, I'm relying on onSpeechResults to indicate end of speech (instead of onSpeechEnd) and am relying on the last text value returned by onSpeechPartialResults. If you're using a state variable to store the text in onSpeechPartialResults, remember that you can't just use the state variable in onSpeechResults as the closure would have captured the value (likely an empty string) when you created the closure. You might have to do a useref like this:
    const inputTextRef = useRef('');

and set the ref in onSpeechPartialResults like this:

    inputTextRef.current = text;

NOT QUITE. KEEP READING!!

  1. Another minor thing I had to work around. If there are short pauses that would have been recognized as end of speech before increasing the silence timeout, this restarts onSpeechPartialResults. So, you can't rely on onSpeechPartialResults being the complete string that captures all speech until now. You have to deal with this with another workaround in onSpeechPartialResults. Create a couple of more useRef strings - previousTextToPrependRef and lastResultRef. Then, in onSpeechPartialResults, do the following:
    if (e.value && !e.value[0].startsWith(lastResultRef.current)) {
        previousTextToPrependRef.current = inputTextRef.current;
    }
    if (e.value && e.value.length > 0) {
        inputTextRef.current = previousTextToPrependRef.current + e.value[0]; 
        lastResultRef.current = e.value[0];
     }

So far so good. I'll modify this post if my results change. (I've edited 4 times already - hopefully for the last time!)

FINALLY a valid solution! Thanks!!

ObscurusGrassator commented 2 months ago

Solution from @nara-falconeer resolve silence between words, but initial silence (delay before speaking) not resolve. Still, it's the best solution for now. THANKS @nara-falconeer Bash script to auto edit VoiceModule.java in this librari for build:

VoiceModule=./node_modules/@react-native-voice/voice/android/src/main/java/com/wenkesj/voice/VoiceModule.java
mv  $VoiceModule ${VoiceModule}_
sed "s/for (String result : matches) {/for (String result : (matches = matches == null ? new ArrayList<String>() : matches)) {/" \
    ${VoiceModule}_ > $VoiceModule
rm ${VoiceModule}_
Jankaz2 commented 1 month ago

@nara-falconeer hi, could you share the entire code of a component how you solved the problem?