Since music often involves high sample rate signals, the goal of the WavTokenizer is to standardize the representation of speech, music, and audio within a single codec model. Therefore, we utilize a 24k model. If you need to train a 16k model, simply modify the configuration file
Since music often involves high sample rate signals, the goal of the WavTokenizer is to standardize the representation of speech, music, and audio within a single codec model. Therefore, we utilize a 24k model. If you need to train a 16k model, simply modify the configuration file