k2-fsa / sherpa

Speech-to-text server framework with next-gen Kaldi
https://k2-fsa.github.io/sherpa
Apache License 2.0
518 stars 104 forks source link

[help wanted] Add endpointing #229

Open csukuangfj opened 1 year ago

csukuangfj commented 1 year ago

We currently have

The steps to add it to online ASR are: (1) Move endpoint.h from csrc to cpp_api (2) Add a method Register(ParseOptions *po) to EndpointConfig and invoke it inside OnlineRecognizerConfig::Regsiter() (3) Add another member bool enable_endpoint; to OnlineRecognizerConfig. If it is false, endpointing is disabled entirely. (4) Online greedy search and modified beam search have already set the field num_trailing_blanks. We need to change online fast beam search to set `num_trailing_blanks. The code that needs to be changed is https://github.com/k2-fsa/sherpa/blob/00f09f97c03459c3a23084eb0968702d2fce8d4d/sherpa/csrc/online-transducer-fast-beam-search-decoder.cc#L108-L111

Add

r.num_trailing_blanks = 0;

Change https://github.com/k2-fsa/sherpa/blob/00f09f97c03459c3a23084eb0968702d2fce8d4d/sherpa/csrc/online-transducer-fast-beam-search-decoder.cc#L125-L128 to


 if (token == 0) { 
   ++t; 
   ++p->num_trailing_blanks;
   continue; 
 } 

p->num_trailing_blanks = 0;

(5) Modify https://github.com/k2-fsa/sherpa/blob/00f09f97c03459c3a23084eb0968702d2fce8d4d/sherpa/cpp_api/online-recognizer.cc#L310 After getting the result, we use num_trailing_blanks of the result to decide whether an endpoint is detected. If yes, we reset the stream by calling something like below

    auto r = decoder_->GetEmptyResult();
    s->SetResult(r);

Note that we need to increase segment and start_frame https://github.com/k2-fsa/sherpa/blob/00f09f97c03459c3a23084eb0968702d2fce8d4d/sherpa/cpp_api/online-stream.h#L32-L35

We.don't.need.to reset the state of the model and the feature extractor.

pingfengluo commented 1 year ago

I will give a hand

csukuangfj commented 1 year ago

I will give a hand

Thanks!