jiwoogit / StyleID

[CVPR 2024 Highlight] Style Injection in Diffusion: A Training-free Approach for Adapting Large-scale Diffusion Models for Style Transfer
MIT License
169 stars 9 forks source link

Question about the context in cross attention #4

Closed PeiChiChen closed 4 months ago

PeiChiChen commented 4 months ago

Hello, thanks for your excellent work! And congraduate for being accepted as CVPR 2024 Highlight!

I have a question about the context in cross attention of stable diffusion. In your work, since you don't give any condition, the context in cross attention should be "None" and the attention should be same as self-attention. However, I run the code and find that the context has value. Do I miss something in your work? Because I am a beginner of stable diffusion, I don't know whether it is your design, or it is designed from stable diffusion like a default context if I do not give any context?

Hope for your reply, thanks again!

jiwoogit commented 4 months ago

Thank you for your interest!

In Stable Diffusion, training includes null text embeddings (maybe 10%) and text caption embeddings for compatibility. Therefore, cross-attention is also utilized in our model, even when generating unconditionally.

Specifically, we use uc = model.get_learned_conditioning([""]) # refer to Line 180 of run_styleid.py, which is different from None.

PeiChiChen commented 4 months ago

OK, got it! Thanks for your reply!