mmorise / World

A high-quality speech analysis, manipulation and synthesis system
http://www.kisc.meiji.ac.jp/~mmorise/world/english
Other
1.15k stars 249 forks source link

Lossless decomposition of .wav signal #135

Closed tshmak closed 2 years ago

tshmak commented 2 years ago

Is it possible to achieve a lossless decomposition of a .wav signal using WORLD? If so, what parameters/options should I be using?

Basically, I'm hoping that my .wav signal (x), after decomposing into (f0, sp, ap) and re-synthesizing using synthesis, would match the original.

Many thanks in advance for any help.

mmorise commented 2 years ago

Unfortunately, lossless decomposition is impossible for general vocoders, including WORLD, because they generally ignore the phase information of the input signal. Additionally, the waveform synthesis algorithm uses a random number to generate the unvoiced component. It suggests that the output depends on the random seed.

tshmak commented 2 years ago

If I want to modify only a small section of an audio, say I have a .wav of around 6sec, and I want to modify only around 0.2sec within, what's your recommended approach? Currently, if I use WORLD to decompose the audio and modify only the the section I want to modify, the entire audio is changed.

Moreover I noticed that the audio generated by WORLD seems to be around 1 frame longer than the original. Can I just ignore the last frame?

Many thanks for your help!

mmorise commented 2 years ago

If I want to modify only a small section of an audio, say I have a .wav of around 6sec, and I want to modify only around 0.2sec within, what's your recommended approach?

One simple idea is the crossfading between the original and synthesized speech.

Can I just ignore the last frame?

Yes, it is possible to ignore the last frame. On the other hand, this approach would shorten the signal length of the synthesized signal.

tshmak commented 2 years ago

Thanks for your help!