mehdidc / feed_forward_vqgan_clip

Feed forward VQGAN-CLIP model, where the goal is to eliminate the need for optimizing the latent space of VQGAN for each input prompt
MIT License
136 stars 18 forks source link

How to get more variation in the null image #27

Open kchodorow opened 2 years ago

kchodorow commented 2 years ago

I've been generating images using this model, which is delightfully fast, but I've noticed that it produces images that are all alike. I tried generating the "null" image by doing:

H = perceptor.encode_text(toks.to(device)).float()
z = net(0 * H)

This resulted in:

base image

And indeed, everything I generated kind of matched that: you can see the fleshly protrusion on the left in "gold coin":

gold-coin--0 0

The object and matching mini-object in "tent":

tent-0 5

And it always seems to try to caption the image with nonsense lettering ("lion"):

lion--0 0

So I'm wondering if there's a way to "prime" the model and suggest it use a different zero image for each run. Is there a variable I can set, or is this deeply ingrained in training data?

Any advice would be appreciated, thank you!

(Apologies if this is the same as #8, but it sounded like #8 was solved by using priors which doesn't seem to help with this.)