Open yl4579 opened 1 week ago
Thank you very much for your question! I have not tested this codec with diffusion-based models such as SimpleTTS or DiTTo-TTS. However, I believe investigating which representations—such as mel, codec latent, or semantic—are better suited for audio diffusion generation could yield valuable insights. Thank you once again for your thoughtful inquiry.
Great work! Have you tested the performance of this codec on diffusion-based models such as SimpleTTS or DiTTo-TTS?