jiwoogit / StyleID

[CVPR 2024 Highlight] Style Injection in Diffusion: A Training-free Approach for Adapting Large-scale Diffusion Models for Style Transfer
MIT License
169 stars 9 forks source link

Papers in details #3

Closed JustinKai0527 closed 4 months ago

JustinKai0527 commented 4 months ago

Hi, thanks for this wonderful work! Your work is achieving astonishing results.. There is one question about the color tone you discuss on section 4.2 in the arxiv paper. In the below picture, the Q K V are feed from the style img feature, so the output I thought only depends on the style img, so when the content img feature have access in this process. I think is the section 4.1 blend query(second picture) doing this? thx! image image

jiwoogit commented 4 months ago

Thank you for your interest!

I think that there might be some confusion about the details of the ablated setting (b).

To generate the results of (b), we didn't use the Q, K, V self-attention features of the content images, and we didn't use query blending.

In other words, we injected the feature of the style images for all self-attention layers: $Q^{cs}_t = Q^{s}_t$, $K^{cs}_t = K^{s}_t$, $V^{cs}_t = V^{s}_t$.

Thus, the generated results are similar to those of the style images, but they don't change in terms of the color tone.

So, we were motivated to modify the initial latent noise ($z^c_T$).

Feel free to ask if you need further clarification.

JustinKai0527 commented 4 months ago

@jiwoogit thx for the reponse. But I got confused is that the QKV are come from the style img, so the output is $Attn(Q^s_t, K^s_t, V^s_t) = \phi_{out}$ so the output $\phi$ won't influence by the latent noise of $z^c_T$?

jiwoogit commented 4 months ago

In the Stable Diffusion architecture, the latent noise ($z^c_T$) may influence the output through two paths:

  1. Skip connections in the U-Net architecture.
  2. Residual connections in the self-attention layers: $\phi\text{final}=\phi+\phi\text{out}$.

Hope this helps! Thank you.

JustinKai0527 commented 4 months ago

thx for the quickly reponse, your work is doing so well