k2-fsa / sherpa-onnx

Speech-to-text, text-to-speech, and speaker recognition using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift, Dart, JavaScript, Flutter
https://k2-fsa.github.io/sherpa/onnx/index.html
Apache License 2.0
2.58k stars 293 forks source link

max speakers for speaker embedding manager #1096

Closed thewh1teagle closed 1 week ago

thewh1teagle commented 2 weeks ago

It would be useful if I there was an option in embedding manager to find the closest one from existing speakers. This way, I can handle myself the case where I know how many speakers there are, and there's already enough speakers detected.

csukuangfj commented 1 week ago

That sounds reasonable.

Could you follow https://github.com/k2-fsa/sherpa-onnx/blob/3e4307e2fb88d4b1b648211c14f2fff6db11bca4/sherpa-onnx/csrc/speaker-embedding-manager.cc#L126 to add a TopK to return the name and scores for the topK match?

thewh1teagle commented 1 week ago

to add a TopK to return the name and scores for the topK match?

I assume we just need to sort the scores, then iterate through them, collect, and return. Could you share any specific IntelliSense and formatting settings used in the repository so I can feel more comfortable working in VSCode?

csukuangfj commented 1 week ago

We are using clang-format, which can be installed with

pip install clang-format

I don't use VSCode. Maybe you can find a way to integrate clang-format with it.

You don't need to care about the style issues. We can reformat the file later.