kan-bayashi / PytorchWaveNetVocoder

WaveNet-Vocoder implementation with pytorch.
https://kan-bayashi.github.io/WaveNetVocoderSamples/
Apache License 2.0
297 stars 57 forks source link

Some questions about the subjective evaluation (MOS chart) #38

Closed unilight closed 5 years ago

unilight commented 6 years ago

If my comprehension is correct, the vocoders on the MOS chart were evaluated in the condition such that the input of the vocoders were features extracted from STRAIGHT, and the output were raw waveforms. If so, then how come STRAIGHT got such low score? Shouldn't it score as high as raw waveform does?

kan-bayashi commented 6 years ago

Hi @unilight. Thank you for your question! You can listen the samples from here. Maybe as you listen, samples of wnv are almost same as raw speech. In the subjective evaluation, the feeling of STRAIGHT samples are definitely different from wnv samples, therefore, subjects tend to set low score. Futhermore, because we want to compare the performance as vocoder, the setting of feature extraction is same for both STRAIGHT and WaveNet vocoder (5ms shift, 24 order mcep). This causes the performance degradation of STRAIGHT. If we use short shift size full spectrum for STRAIGHT, the performance become better.