google / prompt-to-prompt

Apache License 2.0
2.98k stars 279 forks source link

On the non-convergence of null text #69

Closed YasminZhang closed 10 months ago

YasminZhang commented 10 months ago

I tried to replicate the null inversion experiments. However, with the default setting in the paper, my loss item for each step is as follows. And the loss item seems to accumulate and the last step MSE is 0.36 which is too large. And my reconstruction result is also bad. Do someone else have the same problem? image_path = "./example_images/gnochi_mirror.jpeg" prompt = "a cat sitting next to a mirror" (image_gt, image_enc), x_t, uncond_embeddings = null_inversion.invert(image_path, prompt, offsets=(0,0,200,0), verbose=True, num_inner_steps=10, lr=1e-2, betas=(0.9,0.99)) prompts = [prompt] controller = AttentionStore() image_inv, x_t = run_and_display(prompts, controller, run_baseline=False, latent=x_t, uncond_embeddings=uncond_embeddings, verbose=False) print("showing from left to right: the ground truth image, the vq-autoencoder reconstruction, Null-text") ptp_utils.view_images([image_gt, image_enc, image_inv[0]]) show_cross_attention(controller, 16, ["up", "down"])

The result: 0 , 9 : 9.994431820814498e-06 1 , 0 : 8.110590897558723e-06 2 , 0 : 6.078827027522493e-06 3 , 0 : 4.616351361619309e-06 4 , 0 : 3.930223101633601e-06 5 , 0 : 3.9338251553999726e-06 6 , 0 : 4.551481197268004e-06 7 , 0 : 6.7836335801985115e-06 8 , 0 : 1.2606915333890356e-05 9 , 0 : 2.6019246433861554e-05 10 , 0 : 5.095220694784075e-05 11 , 0 : 8.601272566011176e-05 12 , 0 : 0.00012563372729346156 13 , 0 : 0.00018725049449130893 14 , 0 : 0.00028214900521561503 15 , 9 : 0.00032002816442400217 16 , 9 : 0.00040903716580942273 17 , 9 : 0.0005712606944143772 18 , 9 : 0.0008321917266584933 19 , 9 : 0.0012226687977090478 20 , 9 : 0.0017885318957269192 21 , 9 : 0.002541328314691782 22 , 9 : 0.0035280180163681507 23 , 9 : 0.0049383495934307575 24 , 9 : 0.007051061373203993 25 , 9 : 0.009335105307400227 26 , 9 : 0.011939611285924911 27 , 9 : 0.015065964311361313 28 , 9 : 0.018760032951831818 29 , 9 : 0.02296195924282074 30 , 9 : 0.027981098741292953 31 , 9 : 0.03358181565999985 32 , 9 : 0.03978685662150383 33 , 9 : 0.04694744944572449 34 , 9 : 0.054768793284893036 35 , 9 : 0.0635022521018982 36 , 9 : 0.07332171499729156 37 , 9 : 0.0840243250131607 38 , 9 : 0.09570176899433136 39 , 9 : 0.10852541029453278 40 , 9 : 0.12246782332658768 41 , 9 : 0.13685467839241028 42 , 9 : 0.15146127343177795 43 , 9 : 0.1673772931098938 44 , 9 : 0.18442586064338684 45 , 9 : 0.2022503912448883 46 , 9 : 0.2216768115758896 47 , 9 : 0.2423238456249237 48 , 9 : 0.26619306206703186 49 , 9 : 0.36836767196655273