How to edit *with* a prompt the best way?

apolinario commented 1 year ago

Hi everyone! Amazing work, congratulations for the paper and thanks for open sourcing the code!

From what I understood and could test, the src/edit_real.py script perform an edit without the need of a user-inputted prompt. Basically it will input the BLIP-generated prompt in the src/inversion.py step and will include the same prompt both as the regular and the negative prompts, and the actual editing direction comes from the construct_direction function which contains the example cat2dog direction

So one way to use the code would be to use the make_edit_direction.py code with sentences about the two concepts I would like to translate one to the other (say from horse to rabbit). Is that alright?

However, if instead, I do want to use a prompt, what is the best way?

From the paper:

We generate a large bank of diverse sentences for both source s and the target t, either using an off-the-shelf sentence generator like GPT-3 or by using predefined prompts around source and target.

So from what I understand, this is how to edit with a prompt (without a custom edit direction calculated with multiple sentences):

Choose a source and a target prompt with my image (say, the source prompt is the BLIP caption, and the target prompt is that caption with some edit)
Calculate the CLIP embedding this source and a target prompts, and use the difference between them in as the edit_dir
In the EditingPipeline, I enter the target prompt as the regular prompt, and the source prompt as the negative prompt

Is that the best/correct way to do it?

pix2pixzero commented 1 year ago

Hi,

Thank you for your interest in this work! The approach you mentioned makes sense. However, I would try setting the target prompt also as the source prompt for real-image editing. Let me know if this does not result in the desired results!

Regards, Authors

apolinario commented 1 year ago

Hi! Thank you for getting back to me. Here are my experiments:

Source Image: cat_1

Source prompt generated by BLIP: a painting of a cat sitting on top of a blue ball

Target prompts:

a painting of a dog sitting on top of a blue ball
a painting of a capybara sitting on top of a blue ball

Strategy 1

Subtract "target prompt" embeddings from "source prompt" embeddings
Enter the source prompt as both the main and the negative prompts in the generation pipeline

Strategy 2

Subtract "target prompt" embeddings from "source prompt" embeddings
Enter the target prompt as the main prompt and the source prompt as the negative prompt in the generation pipeline

Strategy 3

Subtract "target prompt" embeddings from "source prompt" embeddings
Enter the target prompt as the main prompt and the just the word cat as the negative prompt

So indeed, strategy 1 that you suggested seem to work best! Thanks

pix2pixzero / pix2pix-zero