Open HughPH opened 2 years ago
Sounds fun. A bit like story mode, but more interactive.
What inputs did you use for this outcome? Looks cool!!
What inputs did you use for this outcome? Looks cool!!
Thanks, this was "manually guided". I can't remember exactly what the prompt was, something along the lines of "a shiny metal robot face with glowing blue eyes", but I started with an initial image of a human skull with roughly drawn blue eyeballs (just two circles with a black blob in the middle for a pupil and a couple of highlights for reflections). Then on each call to checkin, I break and await user input. At that point I can check if the image is going how I want, and if it's not I can load it in Krita or Pinta or something and roughly "repair" any features that are not going quite as I like. Just a thick brush with a solid colour is usually sufficient, but I might also select an area and copy it, or stretch or rotate a section, or use Krita's Heal tool to erase a feature. It doesn't need any artistic skill.
That's amazing! How did you do that? Could you share the code? Thanks!
This looks very much like something I would use! It's a great idea either way!
(Almost) all the code you need to do this is already in generate.py.
The first thing I did was add another command line argument:
vq_parser.add_argument("-jr", "--justrun", action="store_true", help="Just run, no breaks", dest="just_run")
Next, I modified the main loop, so that if the just_run
argument has not been passed and the number of iterations is a multiple of the display_freq
argument, the code waits for input. During this wait, you can modify the image which was dumped when checkin()
was called from train()
. Then if you enter "Y", the image is reloaded and the flag to reset the optimizer is set to True
. See the if
statement further down for the make_zoom_video
argument for the same image-reloading code with comments.
try:
resetOptimizer = False
with tqdm() as pbar:
while True:
train(i)
if not args.just_run and i % args.display_freq == 0:
print(f"Modify output{i}.png and press Y, Enter, or just Enter if no change made")
y = input()
if y == 'Y':
img = Image.open(f"output{i}.png")
pil_image = img.convert('RGB')
pil_image = pil_image.resize((sideX, sideY), Image.LANCZOS)
pil_tensor = TF.to_tensor(pil_image)
z, *_ = model.encode(pil_tensor.to(device).unsqueeze(0) * 2 - 1)
z_orig = z.clone()
z.requires_grad_(True)
resetOptimizer = True
If you want to run without waiting for input, you can pass -jr
on the command line, and original behaviour is restored.
Just a quick note: If you're using the -o command line option, f"output{i}.png"
won't work, you need to replace that with args.output + str(i) + ".png"
Just been hit with my own bug :)
Another change I've made for myself is to break every n iterations (after checkin) and await user input. If I input
Y
it reloads the image from disk and reinitialises the optimiser (the same as you do for a zoom video). This way I can "guide" it quite forcefully: if I want a skull with glowing blue eyes, and the blue eyes are not picked up from the init image (or have dissolved into nothing) by the 50th step, I can paint them in. I can also "promote" features in the output by exaggerating their presence.Since we're reinitialising the optimiser, we can presumably also switch up the prompts 'in the middle' of the run, when loss has 'stabilised'? Depending on how far you want to take this (and I'll be doing my own experimentation) maybe we can draw up a timeline and construct a video based on prompts that change over time.