Open dzr1026 opened 1 week ago
Quantized semantic feature, here https://github.com/zhenye234/xcodec/blob/a2e52d30b1ea424f76bb6b88357484d8021f3ab3/models/soundstream_semantic.py#L114
Thank you for your reply!
@zhenye234 Thanks for your previous response! I have a couple more questions about Table 5, if you don't mind:
Thanks so much for your help!
Thank you for your work, it's a very innovative piece of research. I have a question regarding the ARCH benchmark results (Table 5): What is the input for these results? Specifically, what is the "semantic representation"? Is it the latent space after RVQ (Residual Vector Quantization)? Or is the semantic representation the sum of the latent spaces from all eight quantizers?