lucidrains / deep-daze

Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network). Technique was originally created by https://twitter.com/advadnoun
MIT License
4.37k stars 327 forks source link

Image priming gives "The size of tensor a (224) must match the size of tensor b (512) at non-singleton dimension 3" #100

Closed nerdyrodent closed 3 years ago

nerdyrodent commented 3 years ago

Hi, Unless I'm missing something obvious, it looks like the image priming no longer works? E.g.

imagine "A pizza on fire" --open_folder=False --start_image_path=./samples/prime-orig.jpg

siren_pytorch/siren_pytorch.py:101: UserWarning: Using a target size (torch.Size([1, 3, 512, 512])) that is different to the input size (torch.Size([1, 3, 224, 224])) ... RuntimeError: The size of tensor a (224) must match the size of tensor b (512) at non-singleton dimension 3

jack7803m commented 3 years ago

I'm relatively new to this so I could be wrong, but can you check what the resolution of your priming image is? It seems like you need a priming image that is the same resolution as what you're trying to output (512x512 in this case).

nerdyrodent commented 3 years ago

Yup! Tried images of various sizes including 224x224, 512x512 and others.

jack7803m commented 3 years ago

So did that not fix it? The tensor error it's giving you looks like the dimensions of the image. If you use a 512x512 priming image what error does it throw?

nerdyrodent commented 3 years ago

Same error every time. It can be worked around by using "--image_width=224". I'm not that great at coding, so I'm still looking through the rest of the code to see how it handles different image sizes.

NotNANtoN commented 3 years ago

I fixed the error that it does not run at all in the PR above (#103), but the generated images are completely white for large start_image_steps. When I checked out this repo first it already had the same issue - so now we are at least back to the old non-working version.

@nerdyrodent what you can do instead is to pass your image as img to the Imagine class, along with your text. Then Deepdaze will optimize for your image features too. That's quite different from image priming, but still interesting.