Question about editing using P2P DDIM Unconditional

salesforce / EDICT

BSD 3-Clause "New" or "Revised" License

295 stars 22 forks source link

Question about editing using P2P DDIM Unconditional #4

Closed zhihongp closed 1 year ago

zhihongp commented 1 year ago

I am confused by this term as "unconditional" here seems to mean not conditional to texts as I understand, but P2P (prompt-to-prompt) requires two sets of text prompts as conditions. Aren't these two self-conflicting then?

bram-w commented 1 year ago

Hello! There we're using "unconditional" to refer to the inversion technique to get the latents. During generation (latents to new image) the original caption is used to generate the attention maps as in prompt to prompt. The expectation is that the inverse and generative process aren't fully aligned, but it combines the stability of UC inversion with the control of prompt2prompt. Does that explanation make sense?

bram-w commented 1 year ago

Marking as closed but feel free to re-open for further discussions

zhihongp commented 1 year ago

Thanks for the quick response. A quick follow up for further clarifications: 1) As the LDM model itself is conditional to text inputs, for UC inversion, the empty text input (just like the one used in classifier-free guidance) is used, right? 2) Then for DDIM UC Rec. (Table 1), it's UC for both inversion and reconstruction; but for P2P DDIM UC, it's UC for inversion but Conditional (and with classifier-free guidance) for generation/editing, right?