Implementing training-free RB-Modulation pipeline for most used models

AnandK27 commented 2 months ago

Model/Pipeline/Scheduler description

The RB-Modulation algorithm is training-free technique to produce image 2 image style and content transfer in diffusion model. It has two components:

Stochastic Optimization Control (SOC): This component requires an evaluator for the style at each timestep. Therefore, an evaluator model and control function pipeline has to be built.
AttentionFeatureAggregation (AFA): This needs a clip image encoder to concat the K,V features of the image and caption. A slight tweak has to be done in the forward pass of the existing models.

This will be an interesting implementation for edits as the paper shows promising results.

Open source status

[X] The model implementation is available.
[X] The model weights are available (Only relevant if addition is not a scheduler).

Provide useful links for the implementation

RB-Modulation:

Title: RB-Modulation: Training-Free Personalization of Diffusion Models using Stochastic Optimal Control Code Link: https://github.com/google/RB-Modulation Authors: Litu Rout and Yujia Chen and Nataniel Ruiz and Abhishek Kumar and Constantine Caramanis and Sanjay Shakkottai and Wen-Sheng Chu Authors GH Username: @LituRout, @IssacCyj

Style Evaluator:

Title: Measuring Style Similarity in Diffusion Models Code Link: https://github.com/learn2phoenix/CSD Authors: Somepalli, Gowthami and Gupta, Anubhav and Gupta, Kamal and Palta, Shramay and Goldblum, Micah and Geiping, Jonas and Shrivastava, Abhinav and Goldstein, Tom Authors Username: @somepago, @learn2phoenix

github-actions[bot] commented 1 month ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

a-r-r-o-w commented 2 days ago

cc @sayakpaul. This is indeed interesting and some power-users make use of the technique via Comfy. Maybe something to consider for modular diffusers, because might require multiple changes to our pipelines to support

sayakpaul commented 2 days ago

Could be but unsure of the quality/performance trade-off in the presence of existing techniques. With models like OmniGen (which we may support soon), the complementary benefits of RB-modulation (and other techniques alike) might diminish.

huggingface / diffusers