[BUG]: VALLE_V2 inference

jiaweiru commented 1 month ago

Some minor questions about the VALLE V2 documentation and the demo.ipynb scripts.

Incorrect path when downloading pre-trained speech tokenizer from HuggingFace.

huggingface-cli download amphion/valle speechtokenizer_hubert_avg/SpeechTokenizer.pt speechtokenizer_hubert_avg/config.json --local-dir ckpts

It should be: $ mkdir ckpts/speechtokenizer_hubert_avg and huggingface-cli download amphion/valle SpeechTokenizer.pt config.json --local-dir ckpts/speechtokenizer_hubert_avg

BTW, the “VALLE” in the example audio path in the demo script should be capitalized. [Amphion/egs/tts/VALLE_V2/demo.ipynb](https://github.com/open-mmlab/Amphion/blob/main/egs/tts/VALLE_V2/demo.ipynb#L134-L136)

jiaqili3 commented 1 month ago

Fixed. Thanks!

jiaweiru commented 1 month ago

对了，demo.ipynb最后输出音频采样率应该由24000改成16000，因为解码器由encodec换成speech tokenizer了

jiaqili3 commented 1 month ago

Thank you! Just fixed

open-mmlab / Amphion

[BUG]: VALLE_V2 inference #252