Open BumbleStone opened 2 months ago
it seems that the audio codec and the diffusion model are trained together, not trained separately as mentioned in the paper.
it seems that the audio codec and the diffusion model are trained together, not trained separately as mentioned in the paper.