FYI: Run models from piper with the Next-gen Kaldi subproject sherpa-onnx

rhasspy / piper

A fast, local neural text to speech system

https://rhasspy.github.io/piper-samples/

MIT License

4.37k stars 297 forks source link

FYI: Run models from piper with the Next-gen Kaldi subproject sherpa-onnx #251

Open csukuangfj opened 6 months ago

csukuangfj commented 6 months ago

FYI: We have supported piper models in https://github.com/k2-fsa/sherpa-onnx

Note that it does not depend on https://github.com/rhasspy/piper-phonemize

sherpa-onnx supports a variety of platforms, such as

Windows (x86, x64)
Linux (x64, arm, arm64), i.e., rapsberry Pi
macOS (x64, arm64)

It also provides various programming language APIs, e.g., C/C++/Python/Kotlin/Swift/C#/Go. We also have android APKs for TTS.

You can find the installation doc at https://k2-fsa.github.io/sherpa/onnx/install/index.html

You can find the usage of piper models with sherpa-onnx at https://k2-fsa.github.io/sherpa/onnx/tts/pretrained_models/vits.html#lessac-blizzard2013-medium-english-single-speaker Screen Shot 2023-10-26 at 15 43 05

We also have a huggingface space for you to try piper models with sherpa-onnx. Please visit https://huggingface.co/spaces/k2-fsa/text-to-speech

Screen Shot 2023-10-26 at 15 40 08

You can find the PR supporting piper in sherpa-onnx at https://github.com/k2-fsa/sherpa-onnx/pull/390

nanaghartey commented 3 weeks ago

@csukuangfj currently, which model sounds close to human quality on sherpa onnx? Coqui or piper tts models? And are these two the only shpera onnx supports?

csukuangfj commented 3 weeks ago

Please visit https://huggingface.co/spaces/k2-fsa/text-to-speech to try all supported tts models.

There are more than 100 tts models and the best way to find out which model sounds best to you is to try it by yourself. You don't need to install anything to try it.

csukuangfj commented 3 weeks ago

And are these two the only shpera onnx supports?

No.

shepra-onnx currently supports VITS tts models and it is not limited to coqui or piper.

nanaghartey commented 3 weeks ago

Please visit

https://huggingface.co/spaces/k2-fsa/text-to-speech

to try all supported tts models.

There are more than 100 tts models and the best way to find out which model sounds best to you is to try it by yourself.

You don't need to install anything to try it.

I tried a couple of them in the past actually. I was hoping you'd have a "top 3" model list. What I noticed with sherpa onnx is there's a trade off between quality & on-device processing compared to cloud solutions out there. Example standard coqui tts models sound okay but once converted to sherpa onnx the quality and intonation goes down. Are there any tips or tricks to get a good quality on sherpa onnx?

csukuangfj commented 3 weeks ago

Example standard coqui tts models sound okay but once converted to sherpa onnx the quality and intonation goes down

Could you describe which model you are using? @nanaghartey

nanaghartey commented 3 weeks ago

Example standard coqui tts models sound okay but once converted to sherpa onnx the quality and intonation goes down

Could you describe which model you are using? @nanaghartey

I'm using my own fine tuned coqui and piper tts vits models. Both sound good before converting to sherpa onnx...but this is the case for the various other English models I tried out