Offline Recognizer - Passing the Language for Multi-Language Models

iprovalo commented 1 month ago

If I want to extend the existing functionality for Whisper recognizer and pass a language at runtime, what would be a recommended approach? I looked at the decode() API, there are a lot of changes required to add a new parameter there. I was thinking to make a change to the recognizer's config at runtime. Any tips would be appreciated!

csukuangfj commented 1 month ago

Please add two new methods after https://github.com/k2-fsa/sherpa-onnx/blob/117cd7bb8c262580718d87e17f01510c7ead3f92/sherpa-onnx/csrc/offline-recognizer.h#L120


OfflineRecognizerConfig GetConfig() const;

// Onnxruntime Session objects are not affected by this method.
// The exact behavior can be defined by a specific recognizer impl.
// For instance, for the whisper recognizer, you can retrieve the language and task from
// the config and ignore any remaining fields in `config`.
void SetConfig(const OfflineRecognizerConfig& config);

Note that you only need to care about the C++ API. If you want to add APIs for other programming languages, that is also fine.

Also note that you can provide default implementations for the above two newly added methods and you don't need to implement them for all recognizers. You can consider only the whisper recognizer at present.

csukuangfj commented 1 month ago

If we want to specify an initial prompt for whisper, we can add some new fields to the recognizer config object and the interface can be kept the same after you add the two methods.

iprovalo commented 1 month ago

Thank you, @csukuangfj !

iprovalo commented 1 month ago

@csukuangfj I made the changes in this PR: https://github.com/k2-fsa/sherpa-onnx/pull/1124

Whisper only overrides the whisper's model portion.

For iOS, SherpaOnnx can now have this:

class SherpaOnnxOfflineRecognizer {
...
  ///      - config: config overwrite
  func setConfig(config: UnsafePointer<SherpaOnnxOfflineRecognizerConfig>!) {
      SherpaOnnxOfflineRecognizerSetConfig(recognizer, config)
  }

Then in the

SherpaOnnxViewModel
...
    var config = getNonStreamingWhisperTinyLangSpecificConfig(language: language)
    self.offlineRecognizer.setConfig(config: &config)
    startRecorder()

In the model:

Model
...
func getNonStreamingWhisperTinyLangSpecificConfig(language: String) -> SherpaOnnxOfflineRecognizerConfig {
    let modelConfig = getNonStreamingWhisperTiny(language:language)
    let featConfig = sherpaOnnxFeatureConfig(
        sampleRate: 16000,
        featureDim: 80)

    return sherpaOnnxOfflineRecognizerConfig(
        featConfig: featConfig,
        modelConfig: modelConfig,
        decodingMethod: "greedy_search",
        maxActivePaths: 4
    )
}

func getNonStreamingWhisperTiny(language: String) -> SherpaOnnxOfflineModelConfig {
  let encoder = getResource("tiny-encoder.int8", "onnx")
  let decoder = getResource("tiny-decoder.int8", "onnx")
  let tokens = getResource("tiny-tokens", "txt")

  return sherpaOnnxOfflineModelConfig(
    tokens: tokens,
    whisper: sherpaOnnxOfflineWhisperModelConfig(
      encoder: encoder,
      decoder: decoder,
      language: language
    ),
    numThreads: 1,
    debug: 1,
    modelType: "whisper"
  )
}

Please let me know if I got this right, I have tested it locally with iOS and it is working as expected.

I also exposed the recognized language back to the calling client in the result (for the use case of automatic recognition).

I am still coming up to speed on this code base. I struggled to see how useful the GetConfig() will be for my particular use case, but I added it per your request.

Thank you!

csukuangfj commented 1 month ago

I struggled to see how useful the GetConfig() will be for my particular use case, but I added it per your request.

You can remove it if you feel it is not needed at present.

The changes to the Swift code also look good to me. Thanks!

k2-fsa / sherpa-onnx

Offline Recognizer - Passing the Language for Multi-Language Models #1116