Closed OnceJune closed 9 months ago
It isn't easy to speed up the synthesis when using the implemented algorithm. If you want to speed up the synthesis, you should implement another algorithm, and I have proposed an algorithm for this purpose. Since this algorithm is not released yet, you must implement it if needed. https://ieeexplore.ieee.org/document/9023206
Another approach is to reduce the sampling frequency. The 24-kHz (or 22.05 kHz) sampling is reasonable as the value not to degrade the sound quality, and it is straightforward.
@mmorise Thanks, currently I'm using 16k synth, with mgc order 59. I've tried fft length 256, which output good audio quality. When I decrease fft to 128, the quality comes worse. If I use mgc order 23, do you think the quality will be good with fft length 128?
read but not very understand lol
I think appropriate FFT length depends on the F0 of the input signal, and the order of mgc would not affect the best FFT length.
Am I understand correct? (Please delete this comment if I shouldn't write it here since your paper is not released yet:))
Thank you again.
There are several tunings for the 16-kHz speech synthesis. For example, the number of band-pass filters is three. Fig. 1 in the paper shows how to generate the excitation signal. After that, the algorithm process the excitation signal by a simple overlap-add (OLA) algorithm. This idea is similar to the mixed excitation.
Prepare Pulse(Is it minimun phase using sp?);
No. This algorithm uses a zero-phase spectrum to compensate for the original signal completely.
@mmorise Many thanks to your answer. I found minimum phase code in WORLD, how can I find zero-phase spectrum?
The zero-phase spectrum of a spectrum X[k] is defined as the |X[k]|. In this synthesis, we use zero-phase as the phase spectrum of the excitation signals. After generating the excitation signal, the minimum phase spectrum generated from the spectral envelope is used.
hi @mmorise How to generate pulse? Is it generated from pitch in the similar logic as GetPulseLocationsForTimeBase in World code?
Yes, the pulse is generated based on temporal positions in the vocal cord vibrations calculated by GetPulseLocatiosForTimebase in the synthesis function. In detail, amplitude 1 is given at these positions.
Hi @mmorise Thanks for you kind reply. My savior is online now. xD I'm implementing the algorithm, but due to limited knowledge in audio signal processing, I have some questions with the details. Besides, this post is sort of misleading.
I annotated the questions in the figure.
*
symbol indicates temporal convolution? And is the temporal convolution implement via FFT frame-wisely. If this is the case, this part employs FFT N
times , it is time-consuming.Ap
the AperiodicRatio in WORLD code? The ▷
symbol indicates scalar multiplication?I'm sorry for so many questions and I look forward to your replies.
Sorry, I misunderstood. Do you have a MATLAB license? If yes, you can download an implementation of MATLAB (Please see TestWORLDRequiem.m for the usage). https://www.isc.meiji.ac.jp/~mmorise/world/english/download.html
If you don't have it, I'll explain it again, but please give me some time because I have forgotten the details.
I didn't implement a C++ version, which is helpful for practical realization, because it may be close to a patent by another company. This is foresight to avoid trouble in the patent. I guess it is unlikely to cause patent trouble, but please use it with self-responsibility if you implement this program in C++.
Sorry, I misunderstood. Do you have a MATLAB license? If yes, you can download an implementation of MATLAB (Please see TestWORLDRequiem.m for the usage). https://www.isc.meiji.ac.jp/~mmorise/world/english/download.html
If you don't have it, I'll explain it again, but please give me some time because I have forgotten the details.
I didn't implement a C++ version, which is helpful for practical realization, because it may be close to a patent by another company. This is foresight to avoid trouble in the patent. I guess it is unlikely to cause patent trouble, but please use it with self-responsibility if you implement this program in C++.
Thank you for your quick reply!!! Great, the matlab code is open-sourced. I'll dive into the matlab code first.
Hi @mmorise , the matlab code is concise and clear. Now I grasp the idea and implementation details of the paper. Thank you!!
Hi, I tried to use WORLD to synth in mobile phones, the audio quality is good but speed is not fast. Is there any way to speed up synthesis? I called synthesisrealtime, and use a very small fft len, I noticed there're 7 fft forward/inverse when processing only one frame, is it possible to decrease the number? Thanks in advance.