Closed faceair closed 9 months ago
Thanks for your attention. The checkpoint of Vall-E will be released recently. The dataset information will also be involved.
@lmxue @HeCheng0625 Please post the links to checkpoints in this thread when they are ready.
How much data was involved in the pre-training, and how much of it is in Chinese ? Thank you very much.
Thanks for your comments. The pre-trained model of Amphion Vall-E trained on LibriTTS has been released here https://huggingface.co/amphion/valle-libritts
Welcome to test it and give any feedback.
@lmxue I give it a try and see that the quality of generated audio is not very good, is this level of quality expected due to pretraining on relatively small dataset like LibriTTS?
sh egs/tts/VALLE/run.sh --stage 3 --gpu "0"
--config "ckpts/tts/valle_libritts/args.json"
--infer_expt_dir Amphion/ckpts/tts/valle_libritts
--infer_output_dir Amphion/ckpts/tts/valle_libritts/result
--infer_mode "single"
--infer_text "This is a clip of generated speech with the given text from a text to speech model"
--infer_text_prompt "Printing, in the only sense with which we are at present concerned, differs from most if not from all the arts and crafts represented in the Exhibition"
--infer_audio_prompt ./LJSpeech-1.1/wavs/LJ001-0001.wav
https://drive.google.com/file/d/1xTb6WURcckDbV20TpsgyVRKljM9hj8kK/view?usp=sharing
@dongngm at least 10x more data is needed to have a reasonable quality.
How much data was involved in the pre-training, and how much of it is in Chinese ? Thank you very much.