Closed LiuShixing closed 6 days ago
How many hours of Chinese data are there? I tested and found that the reconstruction quality of some Chinese audio is significantly poor. model: WavTokenizer-medium-speech.
From the entire CommonVoice 17 dataset, we randomly selected 500,000 samples, resulting in only about 15 hours of Chinese data. If you require better performance in Chinese, please wait for the large version. Based on my experience with training LanguageCodec, the performance in Chinese improves significantly with larger data volumes. Therefore, you might consider using LanguageCodec for chinese purpose now.
How many hours of Chinese data are there? I tested and found that the reconstruction quality of some Chinese audio is significantly poor. model: WavTokenizer-medium-speech.