The difference between AudioTokenizer and EncodecTokenizer?

yangdongchao / UniAudio

The Open Source Code of UniAudio

522 stars 32 forks source link

The difference between AudioTokenizer and EncodecTokenizer? #16

Open chenxinglili opened 1 year ago

chenxinglili commented 1 year ago

I find 2 tokenizer models for audio, AudioTokenizer and EncodecTokenizer. In egs, tts, vc, and se all use tokenizer "audio". I guess these models are all based on SoundStream. What's the difference?

yangdongchao commented 1 year ago

I find 2 tokenizer models for audio, AudioTokenizer and EncodecTokenizer. In egs, tts, vc, and se all use tokenizer "audio". I guess these models are all based on SoundStream. What's the difference?

Hi, EncodecTokenizer is based on Meta's Encodec model. Please ignore this, we donot use it. Please focus on AudioTokenzier, which is trained by ourself.

chenxinglili commented 1 year ago

Thanks, Are there any plans to open source the egs for audio generation (text-to-audio)?

chenxinglili commented 1 year ago

@yangdongchao I am trying to train text-to-audio/sound. I have two questions that I would like to ask:

task_formats is t2a_format?
The encoding method of "text_emb_seq" was not found in the offline_tokenization.py. How to handle text？Is it okay if I add t5-tokenizer manually?