open-mmlab / Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
https://openhlt.github.io/amphion/
MIT License
4.41k stars 373 forks source link

[Help]: Latent #174

Closed a897456 closed 4 months ago

a897456 commented 4 months ago

https://github.com/open-mmlab/Amphion/blob/5cb75d8d605ef12c90c64ba2e04919f4d5d834a1/models/tts/naturalspeech2/ns2.py#L57 Now, when we look for latent, is the decoder and quantizer in the reverse order?

HeCheng0625 commented 4 months ago

Hi, NS2 paper uses latent after RVQ (in fact, it is the continuous vector corresponding to discrete codes), you can check the details in NS2 paper.

a897456 commented 4 months ago

Hi, NS2 paper uses latent after RVQ (in fact, it is the continuous vector corresponding to discrete codes), you can check the details in NS2 paper.

I understand. I always thought that wav becomes latent after encoder, but in fact, it becomes latent after encoder and RVQ, right?

HarryHe11 commented 4 months ago

Hi, NS2 paper uses latent after RVQ (in fact, it is the continuous vector corresponding to discrete codes), you can check the details in NS2 paper.

I understand. I always thought that wav becomes latent after encoder, but in fact, it becomes latent after encoder and RVQ, right?

yes, NS2 paper uses the latent after RVQ, For details, please refer to https://arxiv.org/abs/2304.09116.