Open Bai-YT opened 3 weeks ago
@sanchit-gandhi @Vaibhavs10 FYI.
@Bai-YT Thank you for your awesome work! I just finished understanding the paper and think that I have a good grasp of the modeling and inference code to convert to diffusers.
@sayakpaul Could I pick this up if no one's working on it?
Yeah for sure.
@a-r-r-o-w cool! but let's put it in community folder to start with
Sure, sounds good.
@Bai-YT Thank you for your awesome work! I just finished understanding the paper and think that I have a good grasp of the modeling and inference code to convert to diffusers.
@sayakpaul Could I pick this up if no one's working on it?
Appreciate everyone's time for helping!!! Massive thanks.
Model/Pipeline/Scheduler description
ConsistencyTTA, introduced in the paper Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation, is an efficient text-to-audio generation model. Compared to a comparable diffusion-based TTA model, ConsistencyTTA achieves a 400x generation speed-up, while retaining the generation quality and diversity.
Due to its high generation quality and fast inference, we believe integrating this model into
diffusers
will makediffusers
more appealing to text-to-audio generation researchers and users! Thank you very much.Open source status
Provide useful links for the implementation
The open-source code implementation can be found at https://github.com/Bai-YT/ConsistencyTTA.
There is also a simplified implementation for inference only: https://github.com/Bai-YT/ConsistencyTTA/tree/main/easy_inference.
The model checkpoints can be found at https://huggingface.co/Bai-YT/ConsistencyTTA.
I am the main author of the code, and am more than happy to assist the integration.