mazzzystar / randomCNN-voice-transfer

Audio style transfer with shallow random parameters CNN.
https://soundcloud.com/mazzzystar/sets/speech-conversion-sample
405 stars 75 forks source link

Phase restoration algorithm #9

Closed asmekal closed 5 years ago

asmekal commented 5 years ago

Hi @mazzzystar , Thank you for great work I am interested in your phase restoration algorithm

def spectrum2wav(spectrum, sr, outfile):
    # Return the all-zero vector with the same shape of `a_content`
    a = np.exp(spectrum) - 1
    p = 2 * np.pi * np.random.random_sample(spectrum.shape) - np.pi
    for i in range(50):
        S = a * np.exp(1j * p)
        x = librosa.istft(S)
        p = np.angle(librosa.stft(x, N_FFT))
    librosa.output.write_wav(outfile, x, sr)

Here we just make initial random assumption on the phase and after several iterative forward and backward transformations for some reason our phase assumption improves.

So my question is why should it converge? Do you probably know any relevant literature about this? I am relatively new to audio processing and can't understand why should this algorithm work well.

mazzzystar commented 5 years ago

This is called Griffin-Lim algorithm. The idea inside is to use spectrogram to gradually reconstruct the phase: 1) we have only the spectrogram S, we randomly initialize the phase with P, so we get the magnitude M. 2) Pass M to the ISTFT to get the raw wav and we get a new phase P'. 3) Replace P with P' and combine it with S to a new magnitude M', this time we think it's much closer to the real magnitude. 4) repeat 2) 3) for iter=50 times.

asmekal commented 5 years ago

Thank you for help!