guanlongzhao / fac-via-ppg

Foreign Accent Conversion by Synthesizing Speech from Phonetic Posteriorgrams (Interspeech'19)
Apache License 2.0
138 stars 42 forks source link

Different PPGs from the same audio #21

Closed Siboooo closed 1 year ago

Siboooo commented 1 year ago

Hey, bro. Thank you for sharing your great work. I was trying to extract PPG features from my own audio. But the result features are different when the input audio is the same. I figured that it's caused by the kaldi function "mfcc.compute_features" return different MFCCs from the same input. Does it affect the accuracy of the PPG and the model? If not, Why?

guanlongzhao commented 1 year ago

The non-deterministic MFCC extraction is due to the "dithering" process in Kaldi, which adds a small random Gaussian noise to the input waveform. See this thread and Dan's comments https://groups.google.com/g/kaldi-help/c/LOD4A7Z9hYY/m/66ZL00fUAAAJ.

Does it affect the accuracy of the PPG and the model? If not, Why?

It should not since the acoustic model was trained to tolerate the "dithering" already. And the accent conversion model was trained with dropout, so it can also tolerate small fluctuations in the PPG signal.