pix2pixzero / pix2pix-zero

Zero-shot Image-to-Image Translation [SIGGRAPH 2023]
https://pix2pixzero.github.io/
MIT License
1.08k stars 79 forks source link

How to edit *with* a prompt the best way? #4

Open apolinario opened 1 year ago

apolinario commented 1 year ago

Hi everyone! Amazing work, congratulations for the paper and thanks for open sourcing the code!

From what I understood and could test, the src/edit_real.py script perform an edit without the need of a user-inputted prompt. Basically it will input the BLIP-generated prompt in the src/inversion.py step and will include the same prompt both as the regular and the negative prompts, and the actual editing direction comes from the construct_direction function which contains the example cat2dog direction

So one way to use the code would be to use the make_edit_direction.py code with sentences about the two concepts I would like to translate one to the other (say from horse to rabbit). Is that alright?

However, if instead, I do want to use a prompt, what is the best way?

From the paper:

We generate a large bank of diverse sentences for both source s and the target t, either using an off-the-shelf sentence generator like GPT-3 or by using predefined prompts around source and target.

So from what I understand, this is how to edit with a prompt (without a custom edit direction calculated with multiple sentences):

Is that the best/correct way to do it?

pix2pixzero commented 1 year ago

Hi,

Thank you for your interest in this work! The approach you mentioned makes sense. However, I would try setting the target prompt also as the source prompt for real-image editing. Let me know if this does not result in the desired results!

Regards, Authors

apolinario commented 1 year ago

Hi! Thank you for getting back to me. Here are my experiments:

Source Image: cat_1

Source prompt generated by BLIP: a painting of a cat sitting on top of a blue ball

Target prompts:

Strategy 1

image image

Strategy 2

image image

Strategy 3

So indeed, strategy 1 that you suggested seem to work best! Thanks