k2-fsa / sherpa-onnx

Speech-to-text, text-to-speech, speaker diarization, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift, Dart, JavaScript, Flutter, Object Pascal, Lazarus, Rust
https://k2-fsa.github.io/sherpa/onnx/index.html
Apache License 2.0
3.56k stars 417 forks source link

Confidence scores with Zipformer models #490

Open asterixvn opened 11 months ago

asterixvn commented 11 months ago

Hi all,

I am decoding a Zipformer model with sherpa-onnx (and K2/icefall) and I am wondering if there is any way to get confidence scores for the hypothesized tokens with sherpa-onnx-offlne or other tools.

If this is not possible, can you please share any hints on where I can get this information from the source-code or how I can generate the confidence scores? Information from K2/icefall scripts and programs will be helpful also.

Many thanks. Bac

csukuangfj commented 11 months ago

Yes, it is possible.

Taking the greedy search as an example,

https://github.com/k2-fsa/sherpa-onnx/blob/03ff9db56e9ed7c0252ae036be333de5db75a746/sherpa-onnx/csrc/offline-transducer-greedy-search-decoder.cc#L52-L55

You can get the log_prob of a token from the above code.

Note that you can compute log_softmax from logit and then get the log_prob of the max token at time t.

asterixvn commented 10 months ago

Thanks a lot, Fangjun. It helps!

KarelVesely84 commented 8 months ago

https://github.com/k2-fsa/sherpa-onnx/pull/571