rhasspy / piper

A fast, local neural text to speech system
https://rhasspy.github.io/piper-samples/
MIT License
4.57k stars 315 forks source link

difference between use_sdp=True and use_sdp=False #502

Open bzp83 opened 2 weeks ago

bzp83 commented 2 weeks ago

I trained a model twice from scratch with the same dataset and same everything, except that one used use_sdp=True and the other use_sdp=False.

I can't see any difference, except the training with use_sdp=False is faster and the exported onnx is slightly smaller. I couldn't notice any difference in inference....

So what's the benefit of using sdp?

synesthesiam commented 2 weeks ago

From what I understand of VITS (the model architecture Piper uses), SDP predicts the duration of phonemes with a probability distribution (SDP = stochastic duration predictor).

I'd guess this means it will do better for intonation? So maybe mostly for audiobooks?

bzp83 commented 2 weeks ago

got it! yes.. maybe it would work better for audiobooks...

btw, I set sdp to false by manually updating the source code, is there a way to pass this param as command line? would passing --use_sdp=False do the same?