Open Bai-YT opened 5 months ago
@sanchit-gandhi @Vaibhavs10 FYI.
@Bai-YT Thank you for your awesome work! I just finished understanding the paper and think that I have a good grasp of the modeling and inference code to convert to diffusers.
@sayakpaul Could I pick this up if no one's working on it?
Yeah for sure.
@a-r-r-o-w cool! but let's put it in community folder to start with
Sure, sounds good.
@Bai-YT Thank you for your awesome work! I just finished understanding the paper and think that I have a good grasp of the modeling and inference code to convert to diffusers.
@sayakpaul Could I pick this up if no one's working on it?
Appreciate everyone's time for helping!!! Massive thanks.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Hi everyone,
Thank you for the effort in adding ConsistencyTTA into diffusers! I just hoped to kindly check in to see if there has been any update. If there's anything I can help, please feel free to let me know!
Sincerely, Yatong
Hi @Bai-YT, thanks for your awesome work! We do have a PR open here, but we also had different plans on how to support it (relevant discussion in the PR). The pipeline works and one can run inference, but I haven't found the time to implement what was discussed in the PR yet. I will try giving it a shot in the near future.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
@sayakpaul Not stale. At some point in Diffusers community scripts compatibility, it would be nice to make it so that modeling + pipeline code in a single file works as expected. This is currently not supported (my PR uses different file for modeling which is on the Hub, and different file for pipeline which is in Diffusers community folder but YiYi mentioned her concerns with this approach so we didn't proceed with it)
Hi @Bai-YT, thanks for your awesome work! We do have a PR open here, but we also had different plans on how to support it (relevant discussion in the PR). The pipeline works and one can run inference, but I haven't found the time to implement what was discussed in the PR yet. I will try giving it a shot in the near future.
Hi Aryan, sorry I just saw the message. Thank you very much for handling this!
I took a look at the PR and it looks awesome! From an algorithmic perspective, I just wanted to mention two things:
guidance_scale
to a number other then 1 will likely not perform very well, and CFG is instead handled by the model internally with guidance_scale_cond
, which should be much more powerful and perform much better.I absolutely understand that these options are for compatibility with other models in the API, and it's very nice to have them here. Not requesting for code changes at all, but perhaps it might worth mentioning these in the documentation so that the users can have an idea.
I wish you a nice rest of the day!
Model/Pipeline/Scheduler description
ConsistencyTTA, introduced in the paper Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation, is an efficient text-to-audio generation model. Compared to a comparable diffusion-based TTA model, ConsistencyTTA achieves a 400x generation speed-up, while retaining the generation quality and diversity.
Due to its high generation quality and fast inference, we believe integrating this model into
diffusers
will makediffusers
more appealing to text-to-audio generation researchers and users! Thank you very much.Open source status
Provide useful links for the implementation
The open-source code implementation can be found at https://github.com/Bai-YT/ConsistencyTTA.
There is also a simplified implementation for inference only: https://github.com/Bai-YT/ConsistencyTTA/tree/main/easy_inference.
The model checkpoints can be found at https://huggingface.co/Bai-YT/ConsistencyTTA.
I am the main author of the code, and am more than happy to assist the integration.