jishengpeng / WavTokenizer

SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling
MIT License
762 stars 42 forks source link

encode and decode for "16k sample" #8

Closed sunnnnnnnny closed 2 months ago

sunnnnnnnny commented 2 months ago

I need to use the model for encoding 16k WAV files. Only i need to resample the speech to 16k before encoding, and save the 16k WAV file after decoding the speech token?

jishengpeng commented 2 months ago

I need to use the model for encoding 16k WAV files. Only i need to resample the speech to 16k before encoding, and save the 16k WAV file after decoding the speech token?

The input and output for the WavTokenizer are audio files sampled at 24 kHz. If your input audio is sampled at 16 kHz, it must be resampled to 24 kHz before being processed by the WavTokenizer. Similarly, if a 16 kHz output is required, simply resample the 24 kHz output audio back to 16 kHz.