jishengpeng / WavTokenizer

SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling
MIT License
727 stars 40 forks source link

Please consider about 16K model? #15

Open ywh-my opened 1 month ago

jishengpeng commented 1 month ago

No description provided.

Since music often involves high sample rate signals, the goal of the WavTokenizer is to standardize the representation of speech, music, and audio within a single codec model. Therefore, we utilize a 24k model. If you need to train a 16k model, simply modify the configuration file