Open pandaGst opened 5 years ago
Hello, @pandaGst . It depends on what dataset you use. For VCTK and LJSpeech datasets which are provided as an example in this repository, the input is character sequence. In our paper for a Japanese corpus, the input is linguistic feature, phoneme sequence and accentual type label https://arxiv.org/abs/1810.11960.
The main network implementation is not restricted to a specific input format. If you develop your own dataset, I think you can combine it without much work.
How to get these linguistic features, phoneme sequences and accentual type labels?
In our paper, we used manual annotation.
Hello, well job! May I know what is your input of this tacotron, linguistic features or charctor sequence?