Compare to Imagic, exmple:
This is Null text method:
prompts = ["A dog",
"A sitting dog"
]
This is imagic:
The complete inference code is as follows:
image_path = "./imgs/dog2.png"
prompt = "A dog"
# offsets=(0,0,200,0)
(image_gt, image_enc), x_t, uncond_embeddings = null_inversion.invert(image_path, prompt, verbose=True)
print("Modify or remove offsets according to your image!")
prompts = [prompt]
controller = AttentionStore()
image_inv, x_t = run_and_display(prompts, controller, run_baseline=False, latent=x_t, uncond_embeddings=uncond_embeddings, verbose=False)
print("showing from left to right: the ground truth image, the vq-autoencoder reconstruction, the null-text inverted image")
ptp_utils.view_images([image_gt, image_enc, image_inv[0]])
show_cross_attention(controller, 16, ["up", "down"])
I think so! Prompt2Prompt requires the attention mask to be similar, which means the shape and the position of a subject/object should remain almost unchanged. P2P is good for appearance change.
Compare to Imagic, exmple: This is Null text method: prompts = ["A dog", "A sitting dog" ] This is imagic: The complete inference code is as follows:
I tried many parameters but couldn't edit this dog,Is this a limitation of the current method??