Different PPGs from the same audio

guanlongzhao / fac-via-ppg

Foreign Accent Conversion by Synthesizing Speech from Phonetic Posteriorgrams (Interspeech'19)

Apache License 2.0

138 stars 42 forks source link

The non-deterministic MFCC extraction is due to the "dithering" process in Kaldi, which adds a small random Gaussian noise to the input waveform. See this thread and Dan's comments https://groups.google.com/g/kaldi-help/c/LOD4A7Z9hYY/m/66ZL00fUAAAJ.

Does it affect the accuracy of the PPG and the model? If not, Why?

It should not since the acoustic model was trained to tolerate the "dithering" already. And the accent conversion model was trained with dropout, so it can also tolerate small fluctuations in the PPG signal.

guanlongzhao / fac-via-ppg

Different PPGs from the same audio #21