ivanvovk / WaveGrad

Implementation of WaveGrad high-fidelity vocoder from Google Brain in PyTorch.
BSD 3-Clause "New" or "Revised" License
402 stars 56 forks source link

TTS without Text? #18

Open ErfolgreichCharismatisch opened 3 years ago

ErfolgreichCharismatisch commented 3 years ago

As I understand it, this tts-algorithm works with your audio files without assigned text.

  1. How would it understand the content, language?
  2. Is it working with the lj-speech set only or a dataset in lj-speech structure?
ivanvovk commented 3 years ago

@ErfolgreichCharismatisch modern TTS models consist of 2 parts: feature generator and vocoder. Feature generator produces low-dimensional time-frequency acoustic features from text, while vocoder reconstructs raw waveform from these features. Each model trains separately. WaveGrad corresponds to the second part, vocoder. It takes acoustic features (mel-spectrograms) as input, not text. And it can be trained on arbitrary dataset.

ErfolgreichCharismatisch commented 3 years ago

Interesting. So which feature generator(s) does it work out of the box with?