Closed hdmjdp closed 4 years ago
@mmorise hi, dou you know what cause this?
Since there are too many possibilities, I cannot answer the question from only provided figures. Please give me detailed information and explanation, including waveform and source code.
@mmorise I get the acoustic features(sp, bap, f0) using analysis code with fft_len=512. And then I just use these features to synthesis the wav below.
Thank you for your information.
The cause is the lack of signal length in low-F0 frames. For example, D4C requires a length of 4*T0. In cases where the F0 at a frame is 100 Hz, D4C uses the frame length of 40 ms. The sampling frequency of 512.wav was 24,000 Hz. The frame length of 40 ms is 960 samples, so the fft_size of 512 cannot cover this frame length. When the fft_size is 512, you can use the F0 around 188 Hz as the lower limit.
@mmorise Thank you, I will try.
@mmorise But,if i use f0_floor=188, the synthesis wav will be dummy as below. how to solvle this. 188.wav.zip
Sorry, 188 Hz is the lower limit that you can analyze/synthesize the speech when you use the fft_size of 512 samples. In other words, you can process speech with the F0s at least above 188 Hz in all frames. Unfortunately, the F0s of your sample contained frames with the F0 below 188 Hz, so you cannot analyze/synthesize it by using the fft_size of 512 samples.