How to compute the text embedding and edit direction?

pix2pixzero / pix2pix-zero

Zero-shot Image-to-Image Translation [SIGGRAPH 2023]

https://pix2pixzero.github.io/

MIT License

1.07k stars 79 forks source link

How to compute the text embedding and edit direction? #1

Closed fortunechen closed 1 year ago

fortunechen commented 1 year ago

Hi, authors!

Congrats for your good work. I wonder how the text embedding and edit direction are computed?

If the dimension of text embedding is 1x768, the cross attention map will be full of 1, which causes cross attention map meaningless

If the dimension of text embedding is N（N=77 in CLIP）x768, then edit direction will be Nx768. However, the edit direction should be a vector.

Looking forward for your answer.

Best regards.

pix2pixzero commented 1 year ago

Hi,

Thank you for your interest in our work! The direction vectors are indeed 77x768, we have just uploaded the instructions and the script for computing custom direction vectors. Let us know if you still have some questions/issues computing new directions!

Regards, Authors

fortunechen commented 1 year ago

Thx for your explanation and open-sourcing your project