mmorise / World

A high-quality speech analysis, manipulation and synthesis system
http://www.kisc.meiji.ac.jp/~mmorise/world/english
Other
1.15k stars 249 forks source link

How to speed up synthesis? #134

Closed OnceJune closed 5 months ago

OnceJune commented 2 years ago

Hi, I tried to use WORLD to synth in mobile phones, the audio quality is good but speed is not fast. Is there any way to speed up synthesis? I called synthesisrealtime, and use a very small fft len, I noticed there're 7 fft forward/inverse when processing only one frame, is it possible to decrease the number? Thanks in advance.

mmorise commented 2 years ago

It isn't easy to speed up the synthesis when using the implemented algorithm. If you want to speed up the synthesis, you should implement another algorithm, and I have proposed an algorithm for this purpose. Since this algorithm is not released yet, you must implement it if needed. https://ieeexplore.ieee.org/document/9023206

Another approach is to reduce the sampling frequency. The 24-kHz (or 22.05 kHz) sampling is reasonable as the value not to degrade the sound quality, and it is straightforward.

OnceJune commented 2 years ago

@mmorise Thanks, currently I'm using 16k synth, with mgc order 59. I've tried fft length 256, which output good audio quality. When I decrease fft to 128, the quality comes worse. If I use mgc order 23, do you think the quality will be good with fft length 128?

OnceJune commented 2 years ago

https://ieeexplore.ieee.org/document/9023206

read but not very understand lol

mmorise commented 2 years ago

I think appropriate FFT length depends on the F0 of the input signal, and the order of mgc would not affect the best FFT length.

OnceJune commented 2 years ago

https://ieeexplore.ieee.org/document/9023206

Am I understand correct? (Please delete this comment if I shouldn't write it here since your paper is not released yet:))

  1. Prepare 7 band-pass filters;
  2. Prepare MVN;
  3. Prepare Pulse(Is it minimun phase using sp?);
  4. Multiply 1 & 3;
  5. Conv 2 & 3;
  6. Multiply each subband from 4 by 1-ap, then sum together;
  7. Multiply each subband from 5 by interpolated ap, then sum together;
  8. Add 6 & 7.

Thank you again.

mmorise commented 2 years ago

There are several tunings for the 16-kHz speech synthesis. For example, the number of band-pass filters is three. Fig. 1 in the paper shows how to generate the excitation signal. After that, the algorithm process the excitation signal by a simple overlap-add (OLA) algorithm. This idea is similar to the mixed excitation.

Prepare Pulse(Is it minimun phase using sp?);

No. This algorithm uses a zero-phase spectrum to compensate for the original signal completely.

OnceJune commented 2 years ago

@mmorise Many thanks to your answer. I found minimum phase code in WORLD, how can I find zero-phase spectrum?

mmorise commented 2 years ago

The zero-phase spectrum of a spectrum X[k] is defined as the |X[k]|. In this synthesis, we use zero-phase as the phase spectrum of the excitation signals. After generating the excitation signal, the minimum phase spectrum generated from the spectral envelope is used.

bfs18 commented 8 months ago

hi @mmorise How to generate pulse? Is it generated from pitch in the similar logic as GetPulseLocationsForTimeBase in World code?

mmorise commented 8 months ago

Yes, the pulse is generated based on temporal positions in the vocal cord vibrations calculated by GetPulseLocatiosForTimebase in the synthesis function. In detail, amplitude 1 is given at these positions.

bfs18 commented 8 months ago

Hi @mmorise Thanks for you kind reply. My savior is online now. xD I'm implementing the algorithm, but due to limited knowledge in audio signal processing, I have some questions with the details. Besides, this post is sort of misleading.

I annotated the questions in the figure. 20231103-120047

  1. is the filter applied via sliding widow multiplication and summation (temporal convolution)?
  2. does the * symbol indicates temporal convolution? And is the temporal convolution implement via FFT frame-wisely. If this is the case, this part employs FFT N times , it is time-consuming.
  3. Is Ap the AperiodicRatio in WORLD code? The symbol indicates scalar multiplication?
  4. c = sqrt(number of samples in frame) ?
  5. envelope shaping is implemented by multiplying the temporal signal with the interpolated AperiodicRatio?
  6. Is step 2 of the algorithm the same as I depicted? The spectrum is first transformed into a minimum phase spectrum, which is then multiplied by the FFT of the excitation signal of the corresponding frame, and finally IFFT is performed.
  7. number of taps of the filters used in 1.?
  8. How is v/uv used in this algorithm?
  9. How to "calculate the filter and the convolution in advance" as mentioned in Section III?

I'm sorry for so many questions and I look forward to your replies.

mmorise commented 8 months ago

Sorry, I misunderstood. Do you have a MATLAB license? If yes, you can download an implementation of MATLAB (Please see TestWORLDRequiem.m for the usage). https://www.isc.meiji.ac.jp/~mmorise/world/english/download.html

If you don't have it, I'll explain it again, but please give me some time because I have forgotten the details.

I didn't implement a C++ version, which is helpful for practical realization, because it may be close to a patent by another company. This is foresight to avoid trouble in the patent. I guess it is unlikely to cause patent trouble, but please use it with self-responsibility if you implement this program in C++.

bfs18 commented 8 months ago

Sorry, I misunderstood. Do you have a MATLAB license? If yes, you can download an implementation of MATLAB (Please see TestWORLDRequiem.m for the usage). https://www.isc.meiji.ac.jp/~mmorise/world/english/download.html

If you don't have it, I'll explain it again, but please give me some time because I have forgotten the details.

I didn't implement a C++ version, which is helpful for practical realization, because it may be close to a patent by another company. This is foresight to avoid trouble in the patent. I guess it is unlikely to cause patent trouble, but please use it with self-responsibility if you implement this program in C++.

Thank you for your quick reply!!! Great, the matlab code is open-sourced. I'll dive into the matlab code first.

bfs18 commented 8 months ago

Hi @mmorise , the matlab code is concise and clear. Now I grasp the idea and implementation details of the paper. Thank you!!