yangdongchao / UniAudio

The Open Source Code of UniAudio
http://dongchaoyang.top/UniAudio_demo/
509 stars 31 forks source link

The difference between AudioTokenizer and EncodecTokenizer? #16

Open chenxinglili opened 10 months ago

chenxinglili commented 10 months ago

I find 2 tokenizer models for audio, AudioTokenizer and EncodecTokenizer. In egs, tts, vc, and se all use tokenizer "audio". I guess these models are all based on SoundStream. What's the difference?

yangdongchao commented 10 months ago

I find 2 tokenizer models for audio, AudioTokenizer and EncodecTokenizer. In egs, tts, vc, and se all use tokenizer "audio". I guess these models are all based on SoundStream. What's the difference?

Hi, EncodecTokenizer is based on Meta's Encodec model. Please ignore this, we donot use it. Please focus on AudioTokenzier, which is trained by ourself.

chenxinglili commented 10 months ago

Thanks, Are there any plans to open source the egs for audio generation (text-to-audio)?

chenxinglili commented 10 months ago

@yangdongchao I am trying to train text-to-audio/sound. I have two questions that I would like to ask:

  1. task_formats is t2a_format?
  2. The encoding method of "text_emb_seq" was not found in the offline_tokenization.py. How to handle text?Is it okay if I add t5-tokenizer manually?