ltfat / phaseret

Phase ReTrieval for time-frequency representations
http://ltfat.github.io/phaseret
GNU General Public License v3.0
51 stars 12 forks source link

phaseret_gla_s function parameters number #6

Closed OswaldoBornemann closed 5 years ago

OswaldoBornemann commented 5 years ago

noticed in gla.h, function phaseret_gla_s(const ltfat_complex_s cinit[], const float g[], ltfat_int L, ltfat_int gl, ltfat_int W, ltfat_int a, ltfat_int M, ltfat_int iter, ltfat_complex_s c[]); has 9 parameters.

But when i load libphaseret.so, and print(getattr(lib_phaseret, 'phaseret_gla_s')), the result told me that phaseret_gla_s has 10 parameters.

In [4]: print(getattr(lib_phaseret, 'phaseret_gla_s'))
<cdata 'int(*)(float(*)[2], int *, float *, int, int, int, int, int, int, float(*)[2])' 0x7f1a3ef80ec0>
OswaldoBornemann commented 5 years ago

And would @susnak tell me that how phaseret wrap gla.h ? I could not find the compile file like compile_spsi.c

susnak commented 5 years ago

The GLA algoritm is written in Matlab. There is also a C implementation in libphaseret but phaseret is not using it.

OswaldoBornemann commented 5 years ago

@susnak, may i ask what is const int mask[] in gla.c ?

susnak commented 5 years ago

hi, the library in phaseret is probably slightly older than the one in libltfat. mask is an array with as many elements as the coefficient array. Can be NULL. Coefficients with corresponding non-zero element in the mask are considered to be "reliable" and are reset to the initial value after each GLA projection.

OswaldoBornemann commented 5 years ago

@susnak thanks a lot. Now i have succeed in reconstructing record audio using phaseret_gla_s. But when i tried to input tacotron's output spectrogram, which is spectrogram, the result seems not normal as the record one. And one thing i would like to say is, in my tacotron project, i set the below parameter

n_fft:2048
hop_length:275
win_length:1102

Here's my code.

import time
import numpy as np
import scipy as sp
import scipy.io.wavfile
import math
import os
from cffi import FFI

ffi = FFI()

libpath = '.'
libltfat_so = os.path.join(libpath, 'libltfat.so')
libltfat_header = os.path.join(libpath, 'ltfat_flat.h')
libphaseret_so = os.path.join(libpath, 'libphaseret.so')
libphaseret_header = os.path.join(libpath, 'phaseret_flat.h')

with open(libltfat_header) as f_header:
        ffi.cdef(f_header.read())

with open(libphaseret_header) as f_header:
    ffi.cdef(f_header.read())

libltfat = ffi.dlopen(libltfat_so)
libphaseret = ffi.dlopen(libphaseret_so)

preemphasis = 0.97

def apply_inv_preemphasis(x):
    if preemphasis == 0:
        raise RuntimeError(" !! Preemphasis is applied with factor 0.0. ")
    return signal.lfilter([1], [1, -preemphasis], x)

def spsi_griffin_lim(spectrogram, save_path):

    a = 256;
    M = 2048;
    gl = 1102;
    M2 = M//2 + 1
    g = np.asfortranarray(np.zeros(gl, dtype=np.float32, order='F'))

    griffin_lim_iter = 5
    status = getattr(libltfat, 'ltfat_firwin_s')( libltfat.LTFAT_HANN, gl, ffi.cast("float *",g.ctypes.data))

    num_bins, num_frames = spectrogram.shape
    L = num_frames * a;
    N = math.ceil(L/M);
    L = N * M;
    N = int(L / a);

    c = np.asfortranarray(np.zeros((M2,N), dtype=np.complex64, order='F'))
    cout_spsi = np.asfortranarray(np.zeros((M2,N), dtype=np.complex64, order='F'))
    cout_gl = np.asfortranarray(np.zeros((M2,N), dtype=np.complex64, order='F'))

#     cinit = np.asfortranarray(np.abs(spectrogram)*np.exp( 1j*np.zeros((M2,num_frames),dtype=np.float32) ))
    s = np.asfortranarray( np.abs(spectrogram), dtype=np.float32 )

    t1 = time.time()
    # SPSI Result
    status = getattr(libphaseret, 'phaseret_spsi_s')(
        ffi.cast("float *", s.ctypes.data),
        L, 1, a, M,
        ffi.cast("float *", 0),
        ffi.cast("float (*)[2]", cout_spsi.ctypes.data)
    )

    # Griffin Lim Result
    status = getattr(libphaseret, 'phaseret_gla_s')(
        ffi.cast("float (*)[2]", cout_spsi.ctypes.data),
        ffi.cast("int *", 0),
        ffi.cast("float *",g.ctypes.data),
        L, gl, 1, a, M, griffin_lim_iter, 
        ffi.cast("float (*)[2]",cout_gl.ctypes.data)
    )

    fout = np.asfortranarray(np.zeros(L, dtype=np.float32, order='F'))

    status = getattr(libltfat, 'ltfat_idgtreal_fb_s')( 
            ffi.cast("float (*)[2]",cout_gl.ctypes.data),
            ffi.cast("float *",g.ctypes.data),
            L, gl, 1, a, M, 1,
            ffi.cast("float *",fout.ctypes.data))

    print(time.time() - t1)

    return  fout

if __name__ == '__main__':

    test_s = scipy.io.loadmat('test.mat')
    test_s = test_s['arr']
    out = spsi_griffin_lim(test_s, 'test_3.wav')
    wav = apply_inv_preemphasis(out)

spsi_gl_ltfat_python_version.zip

OswaldoBornemann commented 5 years ago

test_6.mat.zip @susnak And i also found that if i run a loop, which means below, sometimes i would get nan in phaseret_gla_s result.

if __name__ == '__main__':
    for i in range(100):
        test_s = scipy.io.loadmat('test.mat')
        test_s = test_s['arr']
        out = spsi_griffin_lim(test_s, 'test_3.wav')
        print(out)
        wav = apply_inv_preemphasis(out)
******LOOP 1******
******Griffin Lim Result******
[[nan+nanj nan+nanj nan+nanj ... nan+nanj nan+nanj nan+nanj]
 [nan+nanj nan+nanj nan+nanj ... nan+nanj nan+nanj nan+nanj]
 [nan+nanj nan+nanj nan+nanj ... nan+nanj nan+nanj nan+nanj]
 ...
 [nan+nanj nan+nanj nan+nanj ... nan+nanj nan+nanj nan+nanj]
 [nan+nanj nan+nanj nan+nanj ... nan+nanj nan+nanj nan+nanj]
 [nan+nanj nan+nanj nan+nanj ... nan+nanj nan+nanj nan+nanj]]
******LOOP 2******
******Griffin Lim Result******
[[ 5.1524485e-03+0.0000000e+00j  3.5603815e-03+0.0000000e+00j
   1.8661154e-03+0.0000000e+00j ... -3.1622776e-08+0.0000000e+00j
  -1.6695224e-39+0.0000000e+00j  0.0000000e+00+0.0000000e+00j]
 [ 4.8807026e-03-2.9219876e-03j  4.1275481e-03-1.8931143e-03j
   2.1879466e-03-3.4771659e-04j ...  3.1796155e-08+1.5245520e-09j
   0.0000000e+00+0.0000000e+00j  0.0000000e+00+0.0000000e+00j]
 [ 3.6010991e-03-4.3456964e-03j  4.4693802e-03-1.6810282e-03j
  -8.2804624e-04-1.6155583e-03j ... -3.0523463e-08-8.6406402e-09j
   0.0000000e+00+0.0000000e+00j  0.0000000e+00+0.0000000e+00j]
 ...
 [ 1.0092775e-03-7.4952561e-04j  3.4965572e-04-6.2027108e-04j
  -4.4733029e-06-8.0520740e-06j ... -2.0778240e-08-2.4174357e-08j
   0.0000000e+00+0.0000000e+00j  0.0000000e+00+0.0000000e+00j]
 [ 1.1177950e-03-4.4880691e-04j  3.9120653e-04-5.5711932e-04j
  -6.6340008e-06-3.8118292e-06j ...  1.2780666e-08-2.9012147e-08j
   0.0000000e+00+0.0000000e+00j  0.0000000e+00+0.0000000e+00j]
 [ 1.2019222e-03+0.0000000e+00j  6.2066404e-04+0.0000000e+00j
  -5.9641452e-06+0.0000000e+00j ...  3.1622776e-08+0.0000000e+00j
   0.0000000e+00+0.0000000e+00j  0.0000000e+00+0.0000000e+00j]]
******LOOP 3******
******Griffin Lim Result******
[[nan+nanj nan+nanj nan+nanj ... nan+nanj nan+nanj nan+nanj]
 [nan+nanj nan+nanj nan+nanj ... nan+nanj nan+nanj nan+nanj]
 [nan+nanj nan+nanj nan+nanj ... nan+nanj nan+nanj nan+nanj]
 ...
 [nan+nanj nan+nanj nan+nanj ... nan+nanj nan+nanj nan+nanj]
 [nan+nanj nan+nanj nan+nanj ... nan+nanj nan+nanj nan+nanj]
 [nan+nanj nan+nanj nan+nanj ... nan+nanj nan+nanj nan+nanj]]
******LOOP 4******
******Griffin Lim Result******
[[ 5.1524485e-03+0.0000000e+00j  3.5603815e-03+0.0000000e+00j
   1.8661154e-03+0.0000000e+00j ... -3.1622776e-08+0.0000000e+00j
  -1.6695224e-39+0.0000000e+00j  0.0000000e+00+0.0000000e+00j]
 [ 4.8807026e-03-2.9219876e-03j  4.1275481e-03-1.8931143e-03j
   2.1879466e-03-3.4771659e-04j ...  3.1796155e-08+1.5245520e-09j
   0.0000000e+00+0.0000000e+00j  0.0000000e+00+0.0000000e+00j]
 [ 3.6010991e-03-4.3456964e-03j  4.4693802e-03-1.6810282e-03j
  -8.2804624e-04-1.6155583e-03j ... -3.0523463e-08-8.6406402e-09j
   0.0000000e+00+0.0000000e+00j  0.0000000e+00+0.0000000e+00j]
 ...
 [ 1.0092775e-03-7.4952561e-04j  3.4965572e-04-6.2027108e-04j
  -4.4733029e-06-8.0520740e-06j ... -2.0778240e-08-2.4174357e-08j
   0.0000000e+00+0.0000000e+00j  0.0000000e+00+0.0000000e+00j]
 [ 1.1177950e-03-4.4880691e-04j  3.9120653e-04-5.5711932e-04j
  -6.6340008e-06-3.8118292e-06j ...  1.2780666e-08-2.9012147e-08j
   0.0000000e+00+0.0000000e+00j  0.0000000e+00+0.0000000e+00j]
 [ 1.2019222e-03+0.0000000e+00j  6.2066404e-04+0.0000000e+00j
  -5.9641452e-06+0.0000000e+00j ...  3.1622776e-08+0.0000000e+00j
   0.0000000e+00+0.0000000e+00j  0.0000000e+00+0.0000000e+00j]]
******LOOP 5******
******Griffin Lim Result******
[[nan+nanj nan+nanj nan+nanj ... nan+nanj nan+nanj nan+nanj]
 [nan+nanj nan+nanj nan+nanj ... nan+nanj nan+nanj nan+nanj]
 [nan+nanj nan+nanj nan+nanj ... nan+nanj nan+nanj nan+nanj]
 ...
 [nan+nanj nan+nanj nan+nanj ... nan+nanj nan+nanj nan+nanj]
 [nan+nanj nan+nanj nan+nanj ... nan+nanj nan+nanj nan+nanj]
 [nan+nanj nan+nanj nan+nanj ... nan+nanj nan+nanj nan+nanj]]
******LOOP 6******
******Griffin Lim Result******
[[ 5.1524485e-03+0.0000000e+00j  3.5603815e-03+0.0000000e+00j
   1.8661154e-03+0.0000000e+00j ... -3.1622776e-08+0.0000000e+00j
  -1.6695224e-39+0.0000000e+00j  0.0000000e+00+0.0000000e+00j]
 [ 4.8807026e-03-2.9219876e-03j  4.1275481e-03-1.8931143e-03j
   2.1879466e-03-3.4771659e-04j ...  3.1796155e-08+1.5245520e-09j
   0.0000000e+00+0.0000000e+00j  0.0000000e+00+0.0000000e+00j]
 [ 3.6010991e-03-4.3456964e-03j  4.4693802e-03-1.6810282e-03j
  -8.2804624e-04-1.6155583e-03j ... -3.0523463e-08-8.6406402e-09j
   0.0000000e+00+0.0000000e+00j  0.0000000e+00+0.0000000e+00j]
 ...
 [ 1.0092775e-03-7.4952561e-04j  3.4965572e-04-6.2027108e-04j
  -4.4733029e-06-8.0520740e-06j ... -2.0778240e-08-2.4174357e-08j
   0.0000000e+00+0.0000000e+00j  0.0000000e+00+0.0000000e+00j]
 [ 1.1177950e-03-4.4880691e-04j  3.9120653e-04-5.5711932e-04j
  -6.6340008e-06-3.8118292e-06j ...  1.2780666e-08-2.9012147e-08j
   0.0000000e+00+0.0000000e+00j  0.0000000e+00+0.0000000e+00j]
 [ 1.2019222e-03+0.0000000e+00j  6.2066404e-04+0.0000000e+00j
  -5.9641452e-06+0.0000000e+00j ...  3.1622776e-08+0.0000000e+00j
   0.0000000e+00+0.0000000e+00j  0.0000000e+00+0.0000000e+00j]]

Heres' the spectrogram output from tacotron test_6.mat.zip

susnak commented 5 years ago

The way tacotron, and scipy as well, compute stft is slightly different. Since you already do not have the stft phase, it a bit easier, but not yet fully compatible. Please make yourself familiar with http://ltfat.github.io/doc/gabor/dgtreal.html , especially the part about the signal zero padding to the next multiple of lcm(a,M) and what it means to the size of the coefficient array. Also notice that for synthesis you need a window that is dual to the one you used for analysis.

You probably also want to set the same STFT setting: n_fft:2048 -> M = 2048; hop_length:275 -> a = 256;!!!! win_length:1102 -> gl = 1102;

Good luck! Sorry this is not easier, but it really requires some understanding of the conventions used in LTFAT.

OswaldoBornemann commented 5 years ago

@susnak thanks susnak ! May i also ask why if i run a loop, which means below, sometimes i would get nan in phaseret_gla_s result ?

OswaldoBornemann commented 5 years ago

@susnak and i also check cout_spsi, which is correct in every loop, while cout_gl sometimes happen to nan.

OswaldoBornemann commented 5 years ago

@susnak thanks susnak. I have solved that problem. Thanks a lot.