mmorise / World

A high-quality speech analysis, manipulation and synthesis system
http://www.kisc.meiji.ac.jp/~mmorise/world/english
Other
1.15k stars 249 forks source link

the realtime' result is different from the no-realtime result. #129

Closed bringtree closed 2 years ago

bringtree commented 2 years ago

Hello, I try to align the realtime result and no-realtime result.

I have been replaced the "rand()" with the const value. And I find the realtime's value(by WaveformSynthesis2 or WaveformSynthesis3) is different from the no-realtime value (by WaveformSynthesis).

bringtree commented 2 years ago

For example, in synthesis.cpp

  double coefficient = 2.0 * world::kPi * fractional_time_shift * fs / fft_size;
  GetSpectrumWithFractionalTimeShift(fft_size, coefficient, inverse_real_fft);

btw, in synthesisrealtime.cpp, I can't find this process.

mmorise commented 2 years ago

Thank you for your comment.

There are several different points between real-time and offline synthesizers. One is the fractional component, as you pointed out, and another is the DC removal. The last point is the calculation of the temporal positions of the vocal cord vibrations. Since there is a rounding off, a temporal position is often different from an offline synthesizer.

bringtree commented 2 years ago

@mmorise thx, I also find several different points.

  1. fft_size? fft_size/2? In synthesis.cpp
    GetDCRemover(fft_size, dc_remover);

In synthesisreadltime.cpp

 GetDCRemover(synth->fft_size / 2, synth->dc_remover);

In synthesis.cpp

static void RemoveDCComponent(const double *periodic_response, int fft_size,
                              const double *dc_remover,
                              double *new_periodic_response) {
  double dc_component = 0.0;
  for (int i = fft_size / 2; i < fft_size; ++i)
    dc_component += periodic_response[i];
  for (int i = 0; i < fft_size / 2; ++i)
    new_periodic_response[i] = -dc_component * dc_remover[i];
  for (int i = fft_size / 2; i < fft_size; ++i)
    new_periodic_response[i] -= dc_component * dc_remover[i];
}

In synthesisreadltime.cpp image

mmorise commented 2 years ago

Yes, DC removal of real-time synthesis is different from offline synthesis because of the causality. On the other hand, the influence of the sound quality is negligible.

bringtree commented 2 years ago

@mmorise thx, and about buffer size(64). Is the buffer size related to the sampling rate(24k)? I found the "buffer size" makes a difference in wav result.

mmorise commented 2 years ago

Is the buffer size related to the sampling rate(24k)?

No, you can set the buffer size independently. I think that the difference may be caused by the rounding off in the struct. The calculation of the temporal position of vocal cord vibration may change due to this difference.