Open leejielong opened 1 year ago
Yes, I also tried this and found a similar phenomenon to the third image. I am not sure whether it is because sd-2-1 is trained on 768x768 images and the 2D case is on a 512x512 canvas. I tried to optimize an image of 768x768 resolution but it caused OOM.
I agree this is caused by the higher resolution of sd-2-1 as mentioned by @thuliu-yt16 . Tried SDS with sd-2-1 and could not get any good result like sd-2-1-base.
Hi! I tried using a v-prediction model (SD 2-1) for VSD guidance as suggested in #179 , and I applied v to epsilon conversion for the unet predictions as well. I ran this configuration over a 2D playground pipeline but the results are extremely poor. Has anyone else observed this too? Below are some 2D generated samples from VSD: