modelscope / FunCodec

FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music generation et.al.
https://funcodec.github.io/
MIT License
371 stars 30 forks source link

Relation between bitrate and token ratio #27

Closed lbehringer closed 8 months ago

lbehringer commented 8 months ago

Hi,

reading your paper, it was unclear to me how exactly the token ratio (TKR) relates to the bitrate. Initially, I thought this meant the number of frames per second at 16kHz, where 1 codebook index would be generated per frame. But then I realized this can't be right because in Table 3, different TKRs are shown for the same stride.

Could you further explain the relation between TKR and bitrate, maybe with an example, e.g. for one of the FreqCodec models?

ZhihaoDU commented 8 months ago

For example, sampling rate=16K Hz, codebook size=1024, number of quantizer = 4, Stride=640, TKR=16000/6404=100, bitrate=TKR10=1000 number of quantizer = 2, Stride=320, TKR=16000/3202=100, bitrate=TKR10=1000 number of quantizer = 4, Stride=320, TKR=16000/3204=200, bitrate=TKR10=2000 As you can see, even with the same stride, different numbers of quantizers will lead to different TKR and bitrate. And under same TKR or bitrate, different configuration of quantizers and stride can lead to same TKR and bitrate.

In FunCode, stride=320, TKR is equal to 16000/320(quantizers) = 50 quantizers In FunCode x2, stride=640, TKR is equal to 16000/640(quantizers) = 25 quantizers In FunCode x4, stride=1280, TKR is equal to 16000/1280(quantizers) = 12.5 quantizers

lbehringer commented 8 months ago

Thanks a lot for the clarification!