google / RB-Modulation

Official code for "RB-Modulation: Training-Free Personalization of Diffusion Models using Stochastic Optimal Control"
https://rb-modulation.github.io/
Apache License 2.0
336 stars 26 forks source link

Attention Processor Implementations for SDXL #4

Open dibbla opened 1 month ago

dibbla commented 1 month ago

Hi!

I am working on migrating this great work to SDXL, starting with AFA. However, I found neither direct cascade nor AFA work.

I am using the Clip-ViT-big-G (project to 1280) and Clip-ViT-Large (project to 768) for image embedding and concatenate them as 768+1280=text embedding size. That would be considered as a token. And I parse them to the attention processors. Inside the attention, the image embedding is repeated to 77 to match text features.

However, even when using the direct concate method, I still find it no working. Any suggestion?