keithito / tacotron

A TensorFlow implementation of Google's Tacotron speech synthesis with pre-trained model (unofficial)
MIT License
2.96k stars 958 forks source link

Spectrogram normalization is different from Mel-Spectrogram #102

Closed rafaelvalle closed 6 years ago

rafaelvalle commented 6 years ago

Hey Keith! Thanks for your work.

It looks like the normalization of the Spectrogram is different from the normalization of the Mel-Spectrogram, i.e. there's a difference in offset by ref_level_db.

Is that intentional? If so, why?

def spectrogram(y):
  D = _stft(preemphasis(y))
  S = _amp_to_db(np.abs(D)) - hparams.ref_level_db
  return _normalize(S)

def melspectrogram(y):
  D = _stft(preemphasis(y))
  S = _amp_to_db(_linear_to_mel(np.abs(D)))
  return _normalize(S)
keithito commented 6 years ago

Hi Rafael, thanks for pointing this out!

The purpose of the offset is to eliminate some of the variation in quiet parts of the spectrogram. It seems like it would be useful to apply it in both melspectrogram and spectrogram, so I think this is an unintentional oversight. Please feel free to send a PR to add it melspectrogram.

rafaelvalle commented 6 years ago

Got it. That's a nice trick! I sent you a PR already.