Open chenxinglili opened 1 year ago
I find 2 tokenizer models for audio, AudioTokenizer and EncodecTokenizer. In egs, tts, vc, and se all use tokenizer "audio". I guess these models are all based on SoundStream. What's the difference?
Hi, EncodecTokenizer is based on Meta's Encodec model. Please ignore this, we donot use it. Please focus on AudioTokenzier, which is trained by ourself.
Thanks, Are there any plans to open source the egs for audio generation (text-to-audio)?
@yangdongchao I am trying to train text-to-audio/sound. I have two questions that I would like to ask:
I find 2 tokenizer models for audio, AudioTokenizer and EncodecTokenizer. In egs, tts, vc, and se all use tokenizer "audio". I guess these models are all based on SoundStream. What's the difference?