Questions about details

hhguo / SoCodec

Ultra-low-bitrate Speech Codec for Speech Language Modeling Applications

MIT License

58 stars 3 forks source link

Hi Haibin,

Thanks for your attention.

I am training models based on a complicated training code. I will try to extract the related modules for open-sourcing, but it may take more time.
PQ is "product quantization", OPQ is the proposed "ordered product quantization".
I use the last hidden state of Hubert as the input of a codec model, the VQ layer is applied with OPQ.
codec and TTS-LM are both trained with WenetSpeech4TTS
Thanks for your comment! It will be corrected in the updated arxiv paper.

hhguo / SoCodec