zhenye234 / xcodec

Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
76 stars 3 forks source link

diffusion-based model results #5

Open yl4579 opened 1 week ago

yl4579 commented 1 week ago

Great work! Have you tested the performance of this codec on diffusion-based models such as SimpleTTS or DiTTo-TTS?

zhenye234 commented 1 week ago

Thank you very much for your question! I have not tested this codec with diffusion-based models such as SimpleTTS or DiTTo-TTS. However, I believe investigating which representations—such as mel, codec latent, or semantic—are better suited for audio diffusion generation could yield valuable insights. Thank you once again for your thoughtful inquiry.