spokestack / react-native-spokestack

Spokestack: give your React Native app a voice interface!
https://spokestack.io
Apache License 2.0
56 stars 13 forks source link

Not getting speech recognised in Android #56

Closed Mahesh5645 closed 4 years ago

Mahesh5645 commented 4 years ago

Hello,

Really Appreciate this plugin when I studied it (I am new to this IT World, So might sound childish, please bear with me). My environment is :-

"react": "16.9.0",
"react-native": "0.61.5",
"react-native-spokestack": "^2.1.2",

I was looking for something like this (Steaming Audio & Wake Word Detetction) for my App. (Tip: Please edit the readme to replace google-credential with JSON.stringify(google-credential.json), as it took a long time to realise this)

I want to build an app where the audio will be streaming like for an hour or so and if it catches the wake word like "App Listen this" then it starts an event. The problem I am facing is that after configuring everything and putting the code in place the Start & Activate command works but there is no response after that. I am getting below log for onActivated event. {isActive: true, error: "", message: null, transcript: "", event: "ACTIVATE"} I am getting below log for onTrace event.

{isActive: true, error: "", message: "vad: true", transcript: "", event: "TRACE"}
vad: true
{isActive: true, error: "", message: "vad: false", transcript: "", event: "TRACE"}
vad: false

I don't know how to proceed further so as the app detect my wake word. Please help me in achieving my goal. Will really appreciate your efforts.

I have also followed the example given in Issue #14 https://github.com/rtmalone/spokestack-example/blob/master/App.js

Below is my code of voiceTest.

import React, { Component } from 'react';
import { StyleSheet, Text, View, Image, TouchableOpacity, ImageBackground, TouchableHighlight, Platform } from 'react-native';
import Spokestack from "react-native-spokestack";

class VoiceTest extends Component {
  state = {
     spoken: "",
    recording: false,
    message: null
  };
constructor(props) {
    super(props);
  }
  _startRecognizing = async () => {
    console.log("Inside voice recognising")
    try {
        // Start and stop the speech pipeline. All methods can be called repeatedly.
        Spokestack.start(); // start speech pipeline. can only start after initialize is called.
        console.log("Log  Spokestack.start(); ");
        Spokestack.activate();
        const logEvent = e => console.log("Log is ::",e);
        Spokestack.onActivate = logEvent;
        Spokestack.onSpeechStarted = logEvent;
        Spokestack.onSpeechEnded = logEvent;
        Spokestack.onSpeechRecognized = this.speechDetected;
        Spokestack.onRecognize = e => {
          logEvent(e);
          console.log("onRecognize :: ",e.transcript); // "Hello Spokestack"
        };
        Spokestack.onError = e => {
          Spokestack.stop();
          logEvent("onError "+e);
        };
        Spokestack.onTrace = e => { // subscribe to tracing events according to the trace-level property
          logEvent(e);
          console.log(e.message);
        };

    } catch (e) {
      console.log("error", e)
    }

  }

  startAudio() {
    if (Spokestack && Platform.OS === "android") {
      console.log("inside component")
      Spokestack.initialize({
        input: "com.pylon.spokestack.android.MicrophoneInput", // required, provides audio input into the stages
        stages: [
          "com.pylon.spokestack.webrtc.VoiceActivityDetector", // voice activity detection. necessary to trigger speech recognition.
          "com.pylon.spokestack.google.GoogleSpeechRecognizer" // one of the two supplied speech recognition services
          // 'com.pylon.spokestack.microsoft.BingSpeechRecognizer'
        ],
        properties: {
          "vad-mode": "aggressive",
          "vad-rise-delay": 30,
          "vad-fall-delay": 40,
          "sample-rate": 16000,
          "frame-width": 20,
          "buffer-width": 20,
          "locale": "en-US",
          "google-credentials": JSON.stringify( google-credential.json), // Android-supported api
         // "google-api-key": "", // iOS supported google api
          // 'bing-speech-api-key': YOUR_BING_VOICE_CREDENTIALS,
           "trace-level": Spokestack.TraceLevel.DEBUG
        }
      });
    }
  }

  speechDetected = e => {
    console.log("speech")
    if (e.transcript.length > 0) {
      this.setState({ spoken: e.transcript });
    }
  };

  render() {
    const { recording, message, spoken } = this.state;
    console.log("state is ", this.state);
    return (
      <ImageBackground
        resizeMode={'cover'} // or cover
        style={{ flex: 1, justifyContent: "center", alignItems: "center" }} // must be passed from the parent, the number may vary depending upon your screen size
        source={require('../../../assets/images/meditate.jpg')}
      >
        <TouchableOpacity style={styles.MessageBox} onPress={() => { this.setState({ recording: true }, this.startAudio()); }} >
          <Text style={{ paddingLeft: 10, paddingBottom: 10, marginBottom: 10 }}>Please click to Start Voice initialise</Text>
        </TouchableOpacity>
        {recording && <TouchableOpacity style={styles.MessageBox} onPress={() => { this._startRecognizing() }} >
          <Text style={{ paddingLeft: 10, paddingBottom: 10, marginBottom: 10 }}>Say "Start" to start listening</Text>
        </TouchableOpacity>}
        <View style={[styles.MessageBox, { height: 300, justifyContent: "center", alignContent: "center", alignItems: "center" }]} >
          {recording && <Text>Heard: "{spoken}"</Text>}
          {message && <Text style={styles.message}>{message}</Text>}
        </View>
      </ImageBackground>
    );
  }
}

const styles = StyleSheet.create({
  button: {
    width: 50,
    height: 50,
  },
  container: {
    flex: 1,
    justifyContent: 'center',
    alignItems: 'center',
    backgroundColor: '#F5FCFF',
  },
  welcome: {
    fontSize: 20,
    textAlign: 'center',
    margin: 10,
  },
  action: {
    textAlign: 'center',
    color: '#0000FF',
    marginVertical: 5,
    fontWeight: 'bold',
  },
  instructions: {
    textAlign: 'center',
    color: '#333333',
    marginBottom: 5,
  },
  stat: {
    textAlign: 'center',
    color: '#B0171F',
    marginBottom: 1,
  },
  MessageBox: {
    backgroundColor: "rgba(255,255,255,0.7)",
    minWidth: '90%',
    paddingTop: 10,
    // height:100,
    margin: 20,
    paddingBottom: 15,
    borderBottomLeftRadius: 10,
    // borderBottomRightRadius: number
    borderTopLeftRadius: 10,
    borderTopRightRadius: 10,
    shadowColor: "#000",
    shadowOffset: {
      width: 0,
      height: 4,
    },
    justifyContent: "center",
    alignItems: 'center',
    shadowOpacity: 0.30,
    shadowRadius: 4.65,
    elevation: 8,
    marginBottom: 10,
  },
});
VoiceTest.navigationOptions = ({ navigation }) => {
  return {
    header: null,
  };
};
export default VoiceTest;

Below is my android/app/build.gradle file :-

apply plugin: "com.android.application"
import com.android.build.OutputFile

project.ext.react = [
    entryFile: "index.js",
    enableHermes: false,  // clean and rebuild if changing
]

apply from: "../../node_modules/react-native/react.gradle"

/**
 * Set this to true to create two separate APKs instead of one:
 *   - An APK that only works on ARM devices
 *   - An APK that only works on x86 devices
 * The advantage is the size of the APK is reduced by about 4MB.
 * Upload all the APKs to the Play Store and people will download
 * the correct one based on the CPU architecture of their device.
 */
def enableSeparateBuildPerCPUArchitecture = false

/**
 * Run Proguard to shrink the Java bytecode in release builds.
 */
def enableProguardInReleaseBuilds = false

/**
 * The preferred build flavor of JavaScriptCore.
 *
 * For example, to use the international variant, you can use:
 * `def jscFlavor = 'org.webkit:android-jsc-intl:+'`
 *
 * The international variant includes ICU i18n library and necessary data
 * allowing to use e.g. `Date.toLocaleString` and `String.localeCompare` that
 * give correct results when using with locales other than en-US.  Note that
 * this variant is about 6MiB larger per architecture than default.
 */
def jscFlavor = 'org.webkit:android-jsc:+'

/**
 * Whether to enable the Hermes VM.
 *
 * This should be set on project.ext.react and mirrored here.  If it is not set
 * on project.ext.react, JavaScript will not be compiled to Hermes Bytecode
 * and the benefits of using Hermes will therefore be sharply reduced.
 */
def enableHermes = project.ext.react.get("enableHermes", false);

android {
    compileSdkVersion rootProject.ext.compileSdkVersion

packagingOptions {
exclude 'project.properties'
exclude 'META-INF/INDEX.LIST'
exclude 'META-INF/DEPENDENCIES'
}
    compileOptions {
        sourceCompatibility JavaVersion.VERSION_1_8
        targetCompatibility JavaVersion.VERSION_1_8
    }

    defaultConfig {
                multiDexEnabled true

        applicationId "com.malacounting"
        minSdkVersion rootProject.ext.minSdkVersion
        targetSdkVersion rootProject.ext.targetSdkVersion
        versionCode 1
        versionName "1.0"
    }
    splits {
        abi {
            reset()
            enable enableSeparateBuildPerCPUArchitecture
            universalApk false  // If true, also generate a universal APK
            include "armeabi-v7a", "x86", "arm64-v8a", "x86_64"
        }
    }
    signingConfigs {
        debug {

        }
    }
    buildTypes {
        debug {
            signingConfig signingConfigs.debug
        }
        release {
            // Caution! In production, you need to generate your own keystore file.
            // see https://facebook.github.io/react-native/docs/signed-apk-android.
            signingConfig signingConfigs.debug
            minifyEnabled enableProguardInReleaseBuilds
            proguardFiles getDefaultProguardFile("proguard-android.txt"), "proguard-rules.pro"
        }
    }
    // applicationVariants are e.g. debug, release
    applicationVariants.all { variant ->
        variant.outputs.each { output ->
            // For each separate APK per architecture, set a unique version code as described here:
            // https://developer.android.com/studio/build/configure-apk-splits.html
            def versionCodes = ["armeabi-v7a": 1, "x86": 2, "arm64-v8a": 3, "x86_64": 4]
            def abi = output.getFilter(OutputFile.ABI)
            if (abi != null) {  // null for the universal-debug, universal-release variants
                output.versionCodeOverride =
                        versionCodes.get(abi) * 1048576 + defaultConfig.versionCode
            }

        }
    }
}

dependencies {
    implementation project(':react-native-spokestack')
    implementation project(':react-native-splash-screen')
    implementation project(':react-native-fs')
    implementation fileTree(dir: "libs", include: ["*.jar"])
    implementation "com.facebook.react:react-native:+"  // From node_modules

    if (enableHermes) {
        def hermesPath = "../../node_modules/hermes-engine/android/";
        debugImplementation files(hermesPath + "hermes-debug.aar")
        releaseImplementation files(hermesPath + "hermes-release.aar")
    } else {
        implementation jscFlavor
    }
}

// Run this once to be able to run the application with BUCK
// puts all compile dependencies into folder libs for BUCK to use
task copyDownloadableDepsToLibs(type: Copy) {
    from configurations.compile
    into 'libs'
}

apply from: file("../../node_modules/@react-native-community/cli-platform-android/native_modules.gradle"); applyNativeModulesAppBuildGradle(project)

My android/build.gradle is as below

projects/modules.

buildscript {
    ext {
        buildToolsVersion = "28.0.3"
        minSdkVersion = 16
        compileSdkVersion = 28
        targetSdkVersion = 28
    }
    repositories {
        google()
        jcenter()
    }
    dependencies {
        classpath("com.android.tools.build:gradle:3.4.2")
        classpath('com.google.gms:google-services:3.1.0')

        // NOTE: Do not place your application dependencies here; they belong
        // in the individual module build.gradle files
    }
}

allprojects {
    repositories {
        google()
        mavenLocal()
        maven {
            // All of React Native (JS, Obj-C sources, Android binaries) is installed from npm
            url("$rootDir/../node_modules/react-native/android")
        }
        maven {
            // Android JSC is installed from npm
            url("$rootDir/../node_modules/jsc-android/dist")
        }

        jcenter()
        maven { url 'https://jitpack.io' }
        maven { url "https://maven.google.com" }

    }
}

I am not getting where I went wrong even after following everything correctly. I have enabled cloud speech-to-text api in my google project console too. After resolving all errors I am stuck with no recognition, though my API shows request is going.

Method | Requests | Errors | Avg latency | latency google.cloud.speech.v1.Speech.StreamingRecognize | 26 | 57.69% | 2 minutes | 8 minutes

Any help in this is really appreciated and looking for an early reply. In the meantime I am going through the java code of "com.pylon.spokestack"

noelweichbrodt commented 4 years ago

Hi Mahesh,

Thanks for the note about adding JSON.stringify to the README, that's an oversight by me and will help others in the future!

With regard to not getting an onRecognize event, I see there is some confusion about what each event means.

In your pipeline activation closure:

  _startRecognizing = async () => {
    console.log("Inside voice recognising")
    try {
        // Start and stop the speech pipeline. All methods can be called repeatedly.
        Spokestack.start(); // start speech pipeline. can only start after initialize is called.
        console.log("Log  Spokestack.start(); ");
        Spokestack.activate();

You start the pipeline, which begins listening for a wakeword. Then you immediately activate the pipeline, which stops listening for a wakeword and begins a streaming ASR request to Google's cloud speech-to-text api.

The debug tracing you're seeing are VAD, which means that voice activity detection is triggered, then activate, which means that the ASR is activated.

Assuming that you don't want wakeword-activated ASR, this is fine so far. If you do want to not stream ASR from Google until a wakeword is heard, then just starting the pipeline is sufficient. For more information on the speech pipeline, this (android-specific) doc might help: https://spokestack.io/docs/Android/speech-pipeline.

These lines are concerning:

        Spokestack.onSpeechRecognized = this.speechDetected;
        Spokestack.onRecognize = e => {
          logEvent(e);
          console.log("onRecognize :: ",e.transcript); // "Hello Spokestack"
        };

Spokestack does not emit an onSpeechRecognized event, so your speechDetected function will never be called. I think what you want is to rewrite the above lines into just: Spokestack.onRecognize = this.speechDetected;

Finally, I see from your pasted google API request log that your latency is extremely high. It should be in the millisecond range, not the minute range. Perhaps network conditions are interfering with Google's cloud speech-to-text api requests? There was a similar issue with long latency requests: https://github.com/spokestack/react-native-spokestack/issues/52

Mahesh5645 commented 4 years ago

Hello @noelweichbrodt ,

Thanks for your earnest reply. I apologize for the typo error which I took from the example mentioned in https://github.com/rtmalone/spokestack-example/blob/master/App.js .

I have replaced the code as you have mentioned and below is the code which I am using now. I have added the stop function to stop as the trace is going forever.

import React, { Component } from 'react';
import { StyleSheet, Text, View, Image, TouchableOpacity, ImageBackground, TouchableHighlight, Platform } from 'react-native';
import Spokestack from "react-native-spokestack";

class VoiceTest extends Component {
  state = {
     spoken: "",
    recording: false,
    message: null
  };
constructor(props) {
    super(props);
    const logEvent = e => console.log("Log is ::",e);
    Spokestack.onRecognize = e => {
      logEvent(e);
      console.log("onRecognize :: ",e.transcript); // "Hello Spokestack"
    };
    Spokestack.onError = e => {
      Spokestack.deactivate()
      Spokestack.stop();
      logEvent("onError "+e);
    };
    Spokestack.onTrace = e => { // subscribe to tracing events according to the trace-level property
      logEvent(e);
       if(e.message=="vad: true"){
        console.log("Activating speech")
        Spokestack.activate();
      }
      console.log(e.message);
    };
  }
  _startRecognizing = async () => {
    console.log("Inside voice recognising")
    try {
        // Start and stop the speech pipeline. All methods can be called repeatedly.
        Spokestack.start(); // start speech pipeline. can only start after initialize is called.
        console.log("Log  Spokestack.start(); ");  
    } catch (e) {
      console.log("error", e)
    }
  }
  stopAudio(){
    Spokestack.deactivate()
    Spokestack.stop();
  }
  startAudio() {
    if (Spokestack && Platform.OS === "android") {
      console.log("inside component")
      Spokestack.initialize({
        input: "com.pylon.spokestack.android.MicrophoneInput", // required, provides audio input into the stages
        stages: [
          "com.pylon.spokestack.webrtc.VoiceActivityDetector", // voice activity detection. necessary to trigger speech recognition.
          "com.pylon.spokestack.google.GoogleSpeechRecognizer" // one of the two supplied speech recognition services
          // 'com.pylon.spokestack.microsoft.BingSpeechRecognizer'
        ],
        properties: {
          "vad-mode": "aggressive",
          "vad-rise-delay": 30,
          "vad-fall-delay": 40,
          "sample-rate": 16000,
          "frame-width": 20,
          "buffer-width": 20,
          "locale": "en-US",
          "google-credentials": JSON.stringify(google-credentials.json), // Android-supported api
         // "google-api-key": "", // iOS supported google api
          // 'bing-speech-api-key': YOUR_BING_VOICE_CREDENTIALS,
           "trace-level": Spokestack.TraceLevel.DEBUG
        }
      });
    }
  }
  render() {
    const { recording, message, spoken } = this.state;
    console.log("state is ", this.state);
    return (
      <ImageBackground
        resizeMode={'cover'} // or cover
        style={{ flex: 1, justifyContent: "center", alignItems: "center" }} // must be passed from the parent, the number may vary depending upon your screen size
        source={require('../../../assets/images/meditate.jpg')}
      >
        <TouchableOpacity style={styles.MessageBox} onPress={() => { this.setState({ recording: recording?false:true }, recording?this.stopAudio():this.startAudio()) }} >
          <Text style={{ paddingLeft: 10, paddingBottom: 10, marginBottom: 10 }}>Please click to Start Voice initialise</Text>
        </TouchableOpacity>
        {recording && <TouchableOpacity style={styles.MessageBox} onPress={() => { this._startRecognizing() }} >
          <Text style={{ paddingLeft: 10, paddingBottom: 10, marginBottom: 10 }}>Say "Start" to start listening</Text>
        </TouchableOpacity>}
        <View style={[styles.MessageBox, { height: 300, justifyContent: "center", alignContent: "center", alignItems: "center" }]} >
          {recording && <Text>Heard: "{spoken}"</Text>}
          {message && <Text style={styles.message}>{message}</Text>}
        </View>
      </ImageBackground>
    );
  }
}

const styles = StyleSheet.create({
  button: {
    width: 50,
    height: 50,
  },
  container: {
    flex: 1,
    justifyContent: 'center',
    alignItems: 'center',
    backgroundColor: '#F5FCFF',
  },
  welcome: {
    fontSize: 20,
    textAlign: 'center',
    margin: 10,
  },
  action: {
    textAlign: 'center',
    color: '#0000FF',
    marginVertical: 5,
    fontWeight: 'bold',
  },
  instructions: {
    textAlign: 'center',
    color: '#333333',
    marginBottom: 5,
  },
  stat: {
    textAlign: 'center',
    color: '#B0171F',
    marginBottom: 1,
  },
  MessageBox: {
    backgroundColor: "rgba(255,255,255,0.7)",
    minWidth: '90%',
    paddingTop: 10,
    // height:100,
    margin: 20,
    paddingBottom: 15,
    borderBottomLeftRadius: 10,
    // borderBottomRightRadius: number
    borderTopLeftRadius: 10,
    borderTopRightRadius: 10,
    shadowColor: "#000",
    shadowOffset: {
      width: 0,
      height: 4,
    },
    justifyContent: "center",
    alignItems: 'center',
    shadowOpacity: 0.30,
    shadowRadius: 4.65,
    elevation: 8,
    marginBottom: 10,
  },
});
VoiceTest.navigationOptions = ({ navigation }) => {
  return {
    header: null,
  };
};
export default VoiceTest;

After making the above changes still Spokestack.onRecognize is not detecting anything I am speaking. It still gives me above mentioned logs of trace. In trace I have kept condition to activate listening when VAD is true.

Sorry to bother you much as I am new to this IT World, I want to build the app with a feature like, the app will listen to what I am saying but will activate/start an event only when it will detect/listen a wake word. For eg: I will start the app and keep it on while someone will speak something but when I will say "Party time" then it will start playing songs.

I hope I am not troubling you much and request you to kindly take some of your valuable time to help me out in this.

Currently my code is not able to listen/detect/recognize anything I am speaking. there is no transcript I am receiving from my speech.

Mahesh5645 commented 4 years ago

Hello @noelweichbrodt , Hope I am not bothering you much by my questions and requirements. I have gone through the documentation of Spokestack and found that the wakeup word is "Spokestack" which will get recognised and will activate. Am I correct on this.? Request you to kindly help me out in my app, I am stuck at this point.

noelweichbrodt commented 4 years ago

Hi Mahesh,

the wakeup word is "Spokestack" which will get recognized and will activate. Am I correct on this.?

That is correct. To create a custom wakeword model (such as "Party time"), get in touch for an estimate by using the "Talk to us" button on https://spokestack.io. If you do have the machine learning background, https://spokestack.io/docs/Concepts/wakeword-models provides the specifications for creating your own model that can be dropped into Spokestack.

After making the above changes still Spokestack.onRecognize is not detecting anything I am speaking.

Given what you've mentioned so far, the Google Speech API latency indicates that Spokestack is sending speech successfully, but never getting a response from Google. That would be the first place to begin troubleshooting never receiving an onRecognize event.

Hope this helps!

Mahesh5645 commented 4 years ago

Hello @noelweichbrodt ,

Thanks for your help, really appreciate. I will try the ML Model for wake word and if I get any problem then I will open up this Issue again.