Idea, if we're being extra arty about videos.

HughPH commented 2 years ago

Another change I've made for myself is to break every n iterations (after checkin) and await user input. If I input Y it reloads the image from disk and reinitialises the optimiser (the same as you do for a zoom video). This way I can "guide" it quite forcefully: if I want a skull with glowing blue eyes, and the blue eyes are not picked up from the init image (or have dissolved into nothing) by the 50th step, I can paint them in. I can also "promote" features in the output by exaggerating their presence.

Since we're reinitialising the optimiser, we can presumably also switch up the prompts 'in the middle' of the run, when loss has 'stabilised'? Depending on how far you want to take this (and I'll be doing my own experimentation) maybe we can draw up a timeline and construct a video based on prompts that change over time.

nerdyrodent commented 2 years ago

Sounds fun. A bit like story mode, but more interactive.

lucasantana commented 2 years ago

What inputs did you use for this outcome? Looks cool!!

HughPH commented 2 years ago

What inputs did you use for this outcome? Looks cool!!

Thanks, this was "manually guided". I can't remember exactly what the prompt was, something along the lines of "a shiny metal robot face with glowing blue eyes", but I started with an initial image of a human skull with roughly drawn blue eyeballs (just two circles with a black blob in the middle for a pupil and a couple of highlights for reflections). Then on each call to checkin, I break and await user input. At that point I can check if the image is going how I want, and if it's not I can load it in Krita or Pinta or something and roughly "repair" any features that are not going quite as I like. Just a thick brush with a solid colour is usually sufficient, but I might also select an area and copy it, or stretch or rotate a section, or use Krita's Heal tool to erase a feature. It doesn't need any artistic skill.

matteofedericopazienza commented 2 years ago

That's amazing! How did you do that? Could you share the code? Thanks!

giantmonster commented 2 years ago

This looks very much like something I would use! It's a great idea either way!

HughPH commented 2 years ago

(Almost) all the code you need to do this is already in generate.py.

The first thing I did was add another command line argument:

vq_parser.add_argument("-jr",   "--justrun", action="store_true", help="Just run, no breaks", dest="just_run")

Next, I modified the main loop, so that if the just_run argument has not been passed and the number of iterations is a multiple of the display_freq argument, the code waits for input. During this wait, you can modify the image which was dumped when checkin() was called from train(). Then if you enter "Y", the image is reloaded and the flag to reset the optimizer is set to True. See the if statement further down for the make_zoom_video argument for the same image-reloading code with comments.

try:
    resetOptimizer = False
    with tqdm() as pbar:
        while True:

            train(i)

            if not args.just_run and i % args.display_freq == 0:
                print(f"Modify output{i}.png and press Y, Enter, or just Enter if no change made")
                y = input()
                if y == 'Y':
                  img = Image.open(f"output{i}.png")
                  pil_image = img.convert('RGB')
                  pil_image = pil_image.resize((sideX, sideY), Image.LANCZOS)
                  pil_tensor = TF.to_tensor(pil_image)
                  z, *_ = model.encode(pil_tensor.to(device).unsqueeze(0) * 2 - 1)
                  z_orig = z.clone()
                  z.requires_grad_(True)
                  resetOptimizer = True

If you want to run without waiting for input, you can pass -jr on the command line, and original behaviour is restored.

HughPH commented 2 years ago

Just a quick note: If you're using the -o command line option, f"output{i}.png" won't work, you need to replace that with args.output + str(i) + ".png" Just been hit with my own bug :)

nerdyrodent / VQGAN-CLIP

Idea, if we're being extra arty about videos. #48