sled-group / InfEdit

[CVPR 2024] Official implementation of CVPR 2024 paper: "Inversion-Free Image Editing with Natural Language"
https://sled-group.github.io/InfEdit/
Other
268 stars 8 forks source link

Implementation of Algorithm 2 #26

Closed backpropagator closed 2 months ago

backpropagator commented 3 months ago

Thanks for the amazing paper, I really loved reading it.

I can see that the code provided in pipeline_ead.py is for Algorithm 3 (which includes using a layout branch). I wanted to know if there are any plans to provide the code for Algorithm 3, or is there some minor changes that one can do to get it.

Thanks.

FreeButUselessSoul commented 3 months ago

I think backpropagator is referring to the implementation of "Algorithm 2 DDCM for inversion-free image editing''? One can find it in pipeline_ead.py, line 50, in the definition of ddcm_sampler(). Hope it can help :)

backpropagator commented 2 months ago

Thanks for the prompt reply. I thought the same, but after carefully going through the code, I think Line 83 and 84 are not how they should be? These lines resemble the reverse sampling Equation (3) of the paper. However, Line 7 and 8 of Algorithm 2 of the paper seem to suggest that the direction term should not be used.

Am I missing something here?

Again, thanks for your response.

h6kplus commented 2 months ago

Hi! Thank you for your interest in our paper.

For Algorithm 3, you can find the implementation in class AttentionControlEdit() in app_infedit.py, where both the self attention control and cross attention control are implemented, if you want to find how these controls are injected, you can refer to register_attention_control() in ptp_utils.

For Algorithm2, the two lines you referred to are the implementation of LIne 7 and 8 of Algorithm 2. The two dir_xt will vanish to 0 since eta = 1 here if you look at line 79.

Hope it can help you!

backpropagator commented 2 months ago

Thanks for the quick reply! Really appreciate it.

Algorithm 3 seems fine. And thanks for the explanation, that clears up Algorithm 2 as well. I was just wondering if there is any reason to keep the dir_xt terms in the code?

I mean for any eta != 1, e_c won't be as desired (as described in Section 3 of paper). So, is there any reason to keep that term there? or can we just remove that term?

h6kplus commented 2 months ago

We haven't investigated the impact of different eta on the performance of the model. Here eta serves as a coefficient to control in injection of the noise, and it follows the setting of $\sigmat = \eta\sqrt{1-\alpha{t-1}}$, where we use $\eta=1$