Open bzp83 opened 2 weeks ago
From what I understand of VITS (the model architecture Piper uses), SDP predicts the duration of phonemes with a probability distribution (SDP = stochastic duration predictor).
I'd guess this means it will do better for intonation? So maybe mostly for audiobooks?
got it! yes.. maybe it would work better for audiobooks...
btw, I set sdp to false by manually updating the source code, is there a way to pass this param as command line? would passing --use_sdp=False do the same?
I trained a model twice from scratch with the same dataset and same everything, except that one used use_sdp=True and the other use_sdp=False.
I can't see any difference, except the training with use_sdp=False is faster and the exported onnx is slightly smaller. I couldn't notice any difference in inference....
So what's the benefit of using sdp?