audio codec and the diffusion model are trained together?

lucidrains / naturalspeech2-pytorch

Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch

MIT License

1.26k stars 100 forks source link

Open BumbleStone opened 2 months ago

BumbleStone commented 2 months ago

it seems that the audio codec and the diffusion model are trained together, not trained separately as mentioned in the paper.