huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
23.88k stars 4.92k forks source link

[🌟 New Model] ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation #8414

Open Bai-YT opened 3 weeks ago

Bai-YT commented 3 weeks ago

Model/Pipeline/Scheduler description

ConsistencyTTA, introduced in the paper Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation, is an efficient text-to-audio generation model. Compared to a comparable diffusion-based TTA model, ConsistencyTTA achieves a 400x generation speed-up, while retaining the generation quality and diversity.

Due to its high generation quality and fast inference, we believe integrating this model into diffusers will make diffusers more appealing to text-to-audio generation researchers and users! Thank you very much.

Open source status

Provide useful links for the implementation

The open-source code implementation can be found at https://github.com/Bai-YT/ConsistencyTTA.

There is also a simplified implementation for inference only: https://github.com/Bai-YT/ConsistencyTTA/tree/main/easy_inference.

The model checkpoints can be found at https://huggingface.co/Bai-YT/ConsistencyTTA.

I am the main author of the code, and am more than happy to assist the integration.

sayakpaul commented 3 weeks ago

@sanchit-gandhi @Vaibhavs10 FYI.

a-r-r-o-w commented 6 days ago

@Bai-YT Thank you for your awesome work! I just finished understanding the paper and think that I have a good grasp of the modeling and inference code to convert to diffusers.

@sayakpaul Could I pick this up if no one's working on it?

sayakpaul commented 6 days ago

Yeah for sure.

yiyixuxu commented 6 days ago

@a-r-r-o-w cool! but let's put it in community folder to start with

a-r-r-o-w commented 6 days ago

Sure, sounds good.

Bai-YT commented 6 days ago

@Bai-YT Thank you for your awesome work! I just finished understanding the paper and think that I have a good grasp of the modeling and inference code to convert to diffusers.

@sayakpaul Could I pick this up if no one's working on it?

Appreciate everyone's time for helping!!! Massive thanks.