wenet-e2e / wenet

Production First and Production Ready End-to-End Speech Recognition Toolkit
https://wenet-e2e.github.io/wenet/
Apache License 2.0
3.87k stars 1.03k forks source link

[wenet/utils] add kemans in torch with wenet #2545

Open Mddct opened 1 month ago

Mddct commented 1 month ago

kmeans 是个常用的工具, 这里实现了在wenet的speech model基础上 进行online 提特征 用途: speech encoder 聚类 离散化id -> LLM 语音理解 (asr等) hubert/w2vbert 聚类, 离散化 semantic token -> tts 的semantic 等

TODO:

Mddct commented 1 month ago

it works ! aishell 8gpu

截屏2024-05-31 19 36 01
Mddct commented 3 weeks ago

encode save to file works

截屏2024-06-04 20 00 21 截屏2024-06-04 20 11 57