Closed zhihongp closed 1 year ago
Hello! There we're using "unconditional" to refer to the inversion technique to get the latents. During generation (latents to new image) the original caption is used to generate the attention maps as in prompt to prompt. The expectation is that the inverse and generative process aren't fully aligned, but it combines the stability of UC inversion with the control of prompt2prompt. Does that explanation make sense?
Marking as closed but feel free to re-open for further discussions
Thanks for the quick response. A quick follow up for further clarifications: 1) As the LDM model itself is conditional to text inputs, for UC inversion, the empty text input (just like the one used in classifier-free guidance) is used, right? 2) Then for DDIM UC Rec. (Table 1), it's UC for both inversion and reconstruction; but for P2P DDIM UC, it's UC for inversion but Conditional (and with classifier-free guidance) for generation/editing, right?
I am confused by this term as "unconditional" here seems to mean not conditional to texts as I understand, but P2P (prompt-to-prompt) requires two sets of text prompts as conditions. Aren't these two self-conflicting then?