Open niu0717 opened 4 years ago
when i read gst paper, i found it contains not only the token but also the tone of the speaker. In other word, can we separate prosody from the ref audio as much as possible?
when i read gst paper, i found it contains not only the token but also the tone of the speaker. In other word, can we separate prosody from the ref audio as much as possible?