hi, I tried the code with Chinese corpus, with config:
"sampling_rate": 16000,"filter_length": 1024,"hop_length": 200,"win_length": 800,"n_mel_channels": 80,"mel_fmin": 96.0,"mel_fmax": 7600.0,
The corpus is about 20hours and I picked up the 160th epoch to generate my mel spec.
I tried with griffin-lim by modifiy inference.ipynb:
(y_gen_tst, *r), attn_gen, *_ = model(x_tst, x_tst_lengths, gen=True, noise_scale=noise_scale, length_scale=length_scale)mel_np = y_gen_tst.cpu().squeeze(0).numpy()res = librosa.feature.inverse.mel_to_audio(mel_np, sr=16000, n_fft=1024, hop_length=200, win_length=800)
and finally:
librosa.output.write_wav('sample_output.wav', res, 16000)
And it outputs a long silence like:
The question is should I wait for more epochs? Or maybe I used griffin-lim the wrong way?
BTW, the mel generated is like: [-10.xxxx, -11.xxxx, ...]
hi, I tried the code with Chinese corpus, with config:
"sampling_rate": 16000,
"filter_length": 1024,
"hop_length": 200,
"win_length": 800,
"n_mel_channels": 80,
"mel_fmin": 96.0,
"mel_fmax": 7600.0,
The corpus is about 20hours and I picked up the 160th epoch to generate my mel spec. I tried with griffin-lim by modifiy inference.ipynb:
(y_gen_tst, *r), attn_gen, *_ = model(x_tst, x_tst_lengths, gen=True, noise_scale=noise_scale, length_scale=length_scale)
mel_np = y_gen_tst.cpu().squeeze(0).numpy()
res = librosa.feature.inverse.mel_to_audio(mel_np, sr=16000, n_fft=1024, hop_length=200, win_length=800)
and finally:librosa.output.write_wav('sample_output.wav', res, 16000)
And it outputs a long silence like:The question is should I wait for more epochs? Or maybe I used griffin-lim the wrong way?
BTW, the mel generated is like: [-10.xxxx, -11.xxxx, ...]