When will the pre-trained weights of VALL-E be released?

faceair commented 10 months ago

How much data was involved in the pre-training, and how much of it is in Chinese ? Thank you very much.

lmxue commented 10 months ago

Thanks for your attention. The checkpoint of Vall-E will be released recently. The dataset information will also be involved.

zhizhengwu commented 10 months ago

@lmxue @HeCheng0625 Please post the links to checkpoints in this thread when they are ready.

lmxue commented 10 months ago

How much data was involved in the pre-training, and how much of it is in Chinese ? Thank you very much.

Thanks for your comments. The pre-trained model of Amphion Vall-E trained on LibriTTS has been released here https://huggingface.co/amphion/valle-libritts

Welcome to test it and give any feedback.

dongngm commented 9 months ago

@lmxue I give it a try and see that the quality of generated audio is not very good, is this level of quality expected due to pretraining on relatively small dataset like LibriTTS?

sh egs/tts/VALLE/run.sh --stage 3 --gpu "0"     
--config "ckpts/tts/valle_libritts/args.json"    
--infer_expt_dir Amphion/ckpts/tts/valle_libritts     
--infer_output_dir Amphion/ckpts/tts/valle_libritts/result     
--infer_mode "single"     
--infer_text "This is a clip of generated speech with the given text from a text to speech model"       
--infer_text_prompt "Printing, in the only sense with which we are at present concerned, differs from most if not from all the arts and crafts represented in the Exhibition"     
--infer_audio_prompt ./LJSpeech-1.1/wavs/LJ001-0001.wav

https://drive.google.com/file/d/1xTb6WURcckDbV20TpsgyVRKljM9hj8kK/view?usp=sharing

zhizhengwu commented 9 months ago

@dongngm at least 10x more data is needed to have a reasonable quality.

open-mmlab / Amphion

When will the pre-trained weights of VALL-E be released? #21