hubertsiuzdak / snac

Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate
https://hubertsiuzdak.github.io/snac/
MIT License
250 stars 17 forks source link

what is the codebook size/vocab size? #17

Open huu4ontocord opened 1 month ago

huu4ontocord commented 1 month ago

what is the codebook size / vocab size for encoded snac data for the various models?

ImenKedir commented 2 weeks ago

I found this on huggingface,

24khz model: {   "sampling_rate": 24000,   "encoder_dim": 48,   "encoder_rates": [2, 4, 8, 8],   "decoder_dim": 1024,   "decoder_rates": [8, 8, 4, 2],   "attn_window_size": null,   "codebook_size": 4096,   "codebook_dim": 8,   "vq_strides": [4, 2, 1],   "noise": true,   "depthwise": true }

32khz model: {   "sampling_rate": 32000,   "encoder_dim": 64,   "encoder_rates": [2, 3, 8, 8],   "decoder_dim": 1536,   "decoder_rates": [8, 8, 3, 2],   "attn_window_size": 32,   "codebook_size": 4096,   "codebook_dim": 8,   "vq_strides": [8, 4, 2, 1],   "noise": true,   "depthwise": true }

42khz model: {   "sampling_rate": 44100,   "encoder_dim": 64,   "encoder_rates": [2, 3, 8, 8],   "decoder_dim": 1536,   "decoder_rates": [8, 8, 3, 2],   "attn_window_size": 32,   "codebook_size": 4096,   "codebook_dim": 8,   "vq_strides": [8, 4, 2, 1],   "noise": true,   "depthwise": true }

I am still trying to figure out the number of codebooks per timestep, for example if there are 4 codebooks per timestep the language model would need a vocab size of 4 4096 = 16384 (number of codebooks per timestep codebook size).