Open huu4ontocord opened 5 months ago
I found this on huggingface,
24khz model: { "sampling_rate": 24000, "encoder_dim": 48, "encoder_rates": [2, 4, 8, 8], "decoder_dim": 1024, "decoder_rates": [8, 8, 4, 2], "attn_window_size": null, "codebook_size": 4096, "codebook_dim": 8, "vq_strides": [4, 2, 1], "noise": true, "depthwise": true }
32khz model: { "sampling_rate": 32000, "encoder_dim": 64, "encoder_rates": [2, 3, 8, 8], "decoder_dim": 1536, "decoder_rates": [8, 8, 3, 2], "attn_window_size": 32, "codebook_size": 4096, "codebook_dim": 8, "vq_strides": [8, 4, 2, 1], "noise": true, "depthwise": true }
42khz model: { "sampling_rate": 44100, "encoder_dim": 64, "encoder_rates": [2, 3, 8, 8], "decoder_dim": 1536, "decoder_rates": [8, 8, 3, 2], "attn_window_size": 32, "codebook_size": 4096, "codebook_dim": 8, "vq_strides": [8, 4, 2, 1], "noise": true, "depthwise": true }
I am still trying to figure out the number of codebooks per timestep, for example if there are 4 codebooks per timestep the language model would need a vocab size of 4 4096 = 16384 (number of codebooks per timestep codebook size).
what is the codebook size / vocab size for encoded snac data for the various models?