threestudio-project / threestudio

A unified framework for 3D content generation.
Apache License 2.0
6.14k stars 471 forks source link

Poor sample quality when using V-prediction model for unet in VSD #278

Open leejielong opened 1 year ago

leejielong commented 1 year ago

Hi! I tried using a v-prediction model (SD 2-1) for VSD guidance as suggested in #179 , and I applied v to epsilon conversion for the unet predictions as well. I ran this configuration over a 2D playground pipeline but the results are extremely poor. Has anyone else observed this too? Below are some 2D generated samples from VSD:

Picture 1

thuliu-yt16 commented 1 year ago

Yes, I also tried this and found a similar phenomenon to the third image. I am not sure whether it is because sd-2-1 is trained on 768x768 images and the 2D case is on a 512x512 canvas. I tried to optimize an image of 768x768 resolution but it caused OOM.

yuanzhi-zhu commented 1 year ago

I agree this is caused by the higher resolution of sd-2-1 as mentioned by @thuliu-yt16 . Tried SDS with sd-2-1 and could not get any good result like sd-2-1-base.