Closed treeform closed 1 year ago
Wow, nice catch! It turned out that I implemented encoder's padding in different way with the original one. (at kjsman/stable-diffusion-pytorch, at CompVis/stable-diffusion)
The quick fix -- removing pad at downsampling Conv2d layer and implementing pad at forward method (because asymmetric padding is not supported by PyTorch) -- will be pushed soon.
The better fix -- adding nn.ZeroPad2d layer -- requires revision of the weight file. The problem is, I lost the weight conversion script (this might be also the reply for your prior issue). It was in my old laptop, I migrated to new laptop without that script, and I erased my old laptop. Now I think I should rewrite the script soon, but I can't promise ETA...
Thank you, the fix works great!
Also about the weights, I think I have figured that out: https://github.com/kjsman/stable-diffusion-pytorch/issues/7#issuecomment-1426839447
I am using a simple red image as input:
But I am getting the input image shifted down 8px,8px and it generates ugly brown border:
I am pretty sure it happens during the
Encode
pass as its already shifter in latent space. Here is custom dumping of latent space to image:Some thing in the Encode pass that is shifting it by a pixel in the latent space. And I can't figure out what.