k2-fsa / sherpa-onnx

Speech-to-text, text-to-speech, speaker recognition, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift, Dart, JavaScript, Flutter, Object Pascal, Lazarus, Rust
https://k2-fsa.github.io/sherpa/onnx/index.html
Apache License 2.0
3.11k stars 359 forks source link

is it possible to have realtime keyword spotting in flutter #1248

Open jtdLab opened 1 month ago

jtdLab commented 1 month ago

Hi is it possible to use serpha-onnx for keyword spotting in a stream from mic in a flutter app?

jtdLab commented 1 month ago

I tried to modify the dart example from file but could not make it work for streaming. I get the audio from mic via https://pub.dev/packages/flutter_sound but it never detects any keword.

csukuangfj commented 4 weeks ago

Could you show your changes?

jtdLab commented 4 weeks ago

Model setup should be okay (ModelLoader just loads sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01 and unpacks it so the app can use it)

  Future<void> initialize({
    required int sampleRate,
    required String language,
  }) async {
    sherpa_onnx.initBindings();

    final transducer = sherpa_onnx.OnlineTransducerModelConfig(
      encoder: await _modelLoader.encoderPath(_modelName(language)),
      decoder: await _modelLoader.decoderPath(_modelName(language)),
      joiner: await _modelLoader.joinerPath(_modelName(language)),
    );
    final modelConfig = sherpa_onnx.OnlineModelConfig(
      transducer: transducer,
      tokens: await _modelLoader.tokensPath(_modelName(language)),
    );
    final config = sherpa_onnx.KeywordSpotterConfig(
      model: modelConfig,
      keywordsFile: await _modelLoader.keywordsPath(_modelName(language)),
    );
    _spotter = sherpa_onnx.KeywordSpotter(config);
    _stream = _spotter.createStream();
    _sampleRate = sampleRate;
  }

When now calling predict with samples emitted from flutter_sound stream it predicts null all the time.

  String? predict(Uint8List samples) {
    final samplesFloat32 = _convertBytesToFloat32(samples);
    _stream.acceptWaveform(
      samples: samplesFloat32,
      sampleRate: _sampleRate,
    );

    while (_spotter.isReady(_stream)) {
      _spotter.decode(_stream);
    }

    final keyword = _spotter.getResult(_stream).keyword;
    if (keyword.isNotEmpty) {
      print('Detected: $keyword');
      return keyword;
    }

    return null;
  }
}

Float32List _convertBytesToFloat32(
    Uint8List bytes, [
    Endian endian = Endian.little,
  ]) {
    final values = Float32List(bytes.length ~/ 2);

    final data = ByteData.view(bytes.buffer);

    for (var i = 0; i < bytes.length; i += 2) {
      final short = data.getInt16(i, endian);
      values[i ~/ 2] = short / 32678.0;
    }

    return values;
  }

flutter_sound config looks like this 16bit PCM, 16000 sampleRate

 await _recorder.startRecorder(
        toStream: _audioController.sink,
        codec: base.Codec.pcm16,
      );
csukuangfj commented 4 weeks ago

please check _spotter.ptr and _stream.ptr and see if they are null.

I suspect that model initialization is failed.

Make sure you read the logs carefully.

Note you can pass debug: true to ModelConfig to get more logs.

csukuangfj commented 4 weeks ago

By the way, please change

    while (_spotter.isReady(_stream)) {
      _spotter.decode(_stream);
    }

    final keyword = _spotter.getResult(_stream).keyword;
    if (keyword.isNotEmpty) {
      print('Detected: $keyword');
      return keyword;
    }

You need to put _spotter.getResult in the while loop.

In case you have not read our KWS dart example, please read it now: https://github.com/k2-fsa/sherpa-onnx/blob/a7dc6c2c165de16c68daaf78490d159f51c54d44/dart-api-examples/keyword-spotter/bin/zipformer-transducer.dart#L72-L78

It clearly shows how to do that.