jishengpeng / WavTokenizer

SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling
MIT License
658 stars 37 forks source link

How many hours of Chinese data are there? #30

Closed LiuShixing closed 6 days ago

LiuShixing commented 6 days ago

How many hours of Chinese data are there? I tested and found that the reconstruction quality of some Chinese audio is significantly poor. model: WavTokenizer-medium-speech.

jishengpeng commented 6 days ago

How many hours of Chinese data are there? I tested and found that the reconstruction quality of some Chinese audio is significantly poor. model: WavTokenizer-medium-speech.

From the entire CommonVoice 17 dataset, we randomly selected 500,000 samples, resulting in only about 15 hours of Chinese data. If you require better performance in Chinese, please wait for the large version. Based on my experience with training LanguageCodec, the performance in Chinese improves significantly with larger data volumes. Therefore, you might consider using LanguageCodec for chinese purpose now.