Why we use audio samples as input in this code. Research paper stated that text sequence is the input to tacotron-2 model

keithito / tacotron

A TensorFlow implementation of Google's Tacotron speech synthesis with pre-trained model (unofficial)

MIT License

2.94k stars 965 forks source link

Open tanu456 opened 3 years ago

ljuvela commented 3 years ago

The model is also autoregressive on audio features.