lifeiteng / vall-e

PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html
https://lifeiteng.github.io/valle/index.html
Apache License 2.0
2.03k stars 319 forks source link

why need text-prompts in inference? #146

Closed yiwei0730 closed 1 year ago

yiwei0730 commented 1 year ago

If we need to infer a TTS audio in this system, then we maybe just need the audio-prompts and the text which we want to synthesis? why the text-prompts is needed in inference parser.

chenjiasheng commented 1 year ago

because during training we didn't provide the text to the prompt; even more, we didn't distinguish between prompt and target during training. so, we have to provide as much text prompt as the audio prompt.