kjsman / stable-diffusion-pytorch

Yet another PyTorch implementation of Stable Diffusion (probably easy to read)
MIT License
540 stars 62 forks source link

Image in latent space gets shifted during encoding. #8

Closed treeform closed 1 year ago

treeform commented 1 year ago

I am using a simple red image as input:

red

from stable_diffusion_pytorch import pipeline
from PIL import Image

prompts = ["a photograph of an astronaut riding a horse"]
input_images = [Image.open('red.png')]
images = pipeline.generate(prompts, input_images=input_images)
images[0].save('output.png')

But I am getting the input image shifted down 8px,8px and it generates ugly brown border:

output

I am pretty sure it happens during the Encode pass as its already shifter in latent space. Here is custom dumping of latent space to image:

encodeDecode

Some thing in the Encode pass that is shifting it by a pixel in the latent space. And I can't figure out what.

kjsman commented 1 year ago

Wow, nice catch! It turned out that I implemented encoder's padding in different way with the original one. (at kjsman/stable-diffusion-pytorch, at CompVis/stable-diffusion)

The quick fix -- removing pad at downsampling Conv2d layer and implementing pad at forward method (because asymmetric padding is not supported by PyTorch) -- will be pushed soon.

The better fix -- adding nn.ZeroPad2d layer -- requires revision of the weight file. The problem is, I lost the weight conversion script (this might be also the reply for your prior issue). It was in my old laptop, I migrated to new laptop without that script, and I erased my old laptop. Now I think I should rewrite the script soon, but I can't promise ETA...

treeform commented 1 year ago

Thank you, the fix works great!

treeform commented 1 year ago

Also about the weights, I think I have figured that out: https://github.com/kjsman/stable-diffusion-pytorch/issues/7#issuecomment-1426839447