mlcommons / peoples-speech

The People’s Speech Dataset
https://mlcommons.org/en/peoples-speech/
Apache License 2.0
98 stars 12 forks source link

Implement text normalization #30

Open galv opened 3 years ago

galv commented 3 years ago

We are lacking text normalization at the moment, which qualifies in my book as a serious flaw.

I recommend Sparrowhawk as a first step: https://github.com/google/sparrowhawk Ideally we would wrap its build system in bazel. But we could also add it to the Dockerfile as well.

galv commented 3 years ago

https://github.com/rhasspy/gruut this may be a good alternative. Multiple languages no C++ dependency to install.