mmorise / World

A high-quality speech analysis, manipulation and synthesis system
http://www.kisc.meiji.ac.jp/~mmorise/world/english
Other
1.17k stars 251 forks source link

Function [Synthesis] Error : Max value of y >=1.0 #50

Closed langmaninternet closed 6 years ago

langmaninternet commented 6 years ago

I found small bug, that in Synthesis Function:

Sometimes, max value of y >=1.0 , leads to bad output wav

And it can be fixed like

In void Synthesis(const double f0, int f0_length, const double const spectrogram, const double const aperiodicity, int fft_size, double frame_period, int fs, int y_length, double y)

    frame_period /= 1000.0;
    int noise_size;
/****/  double max_value_of_y = 0.0;
/****/  const double expect_max_value_of_y = 0.9999;
    for (int i = 0; i < number_of_pulses; ++i) {
        noise_size = pulse_locations_index[MyMinInt(number_of_pulses - 1, i + 1)] -
            pulse_locations_index[i];
        GetOneFrameSegment(interpolated_vuv[pulse_locations_index[i]], noise_size,
            spectrogram, fft_size, aperiodicity, f0_length, frame_period,
            pulse_locations[i], pulse_locations_time_shift[i], fs,
            &forward_real_fft, &inverse_real_fft, &minimum_phase, dc_remover,
            impulse_response);
        int index = 0;
        for (int j = 0; j < fft_size; ++j) {
            index = j + pulse_locations_index[i] - fft_size / 2 + 1;
            if (index < 0 || index > y_length - 1) continue;
            y[index] += impulse_response[j];
/****/          if (max_value_of_y < y[index]) max_value_of_y = y[index];
        }
    }
/****/  if (max_value_of_y > expect_max_value_of_y)
/****/  {
/****/      double recude_coefficient = expect_max_value_of_y / max_value_of_y;
/****/      for (int i = 0; i < y_length; ++i) y[i] *= recude_coefficient;
/****/  }
    delete[] dc_remover;
    delete[] pulse_locations;
    delete[] pulse_locations_index;
    delete[] pulse_locations_time_shift;
    delete[] interpolated_vuv;
mmorise commented 6 years ago

Thank you for your proposal.

Unfortunately, I think that it is not good because there is a file format that the range is not limited from -1 to 1 (e.g. .aiff). On the other hand, since amplitude normalization for each file format is important, a function for this purpose should be implemented independently as the process after synthesis.