open-mmlab / Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
https://openhlt.github.io/amphion/
MIT License
4.45k stars 379 forks source link

[BUG]: VALLE_V2 inference #252

Closed jiaweiru closed 1 month ago

jiaweiru commented 1 month ago

Some minor questions about the VALLE V2 documentation and the demo.ipynb scripts.

Incorrect path when downloading pre-trained speech tokenizer from HuggingFace.

huggingface-cli download amphion/valle speechtokenizer_hubert_avg/SpeechTokenizer.pt speechtokenizer_hubert_avg/config.json --local-dir ckpts

It should be: $ mkdir ckpts/speechtokenizer_hubert_avg and huggingface-cli download amphion/valle SpeechTokenizer.pt config.json --local-dir ckpts/speechtokenizer_hubert_avg

BTW, the “VALLE” in the example audio path in the demo script should be capitalized. [Amphion/egs/tts/VALLE_V2/demo.ipynb](https://github.com/open-mmlab/Amphion/blob/main/egs/tts/VALLE_V2/demo.ipynb#L134-L136)

jiaqili3 commented 1 month ago

Fixed. Thanks!

jiaweiru commented 1 month ago

对了,demo.ipynb最后输出音频采样率应该由24000改成16000,因为解码器由encodec换成speech tokenizer了

jiaqili3 commented 1 month ago

Thank you! Just fixed