Open hejiaxiang1 opened 3 months ago
I have a similar issue. I tried to visualize ddpm uncertainty of imagenet generations, but the image is not very meaningful.
In section 4.3, you write we sample a variety of latent states...estimate the empirical variance...as the final pixel-wise uncertainty
How did you sample exactly? Did you do Gaussian sample over the final exp_xt?
Thank you for your time and help!
Thanks for your interest in our work!
For variance visualization of Stable Diffusion in the latent space, we save $E(z_0)$ and $Var(z_0)$ (exp_xt_next and var_xtnext in the xxUQ.py script) and resample $z{0,1}, ..., z_{0,N}$ from Gaussian distribution $\mathcal{N}(E(z_0), Var(z0))$. Then we decode them to $x{0,1}, ..., x_{0,N}$ and estimate the empirical variance as the final pixel-wise variance.
For the code for visualization, you refer to this script below. Feel free to ask if you have any further questions.
import torch
from matplotlib import pyplot as plt
from ldm.util import instantiate_from_config
from omegaconf import OmegaConf
from torchvision import transforms
import torchvision.utils as tvu
to_pil = transforms.ToPILImage()
def load_model_from_config(config, ckpt, verbose=False):
print(f"Loading model from {ckpt}")
pl_sd = torch.load(ckpt, map_location="cpu")
if "global_step" in pl_sd:
print(f"Global Step: {pl_sd['global_step']}")
sd = pl_sd["state_dict"]
model = instantiate_from_config(config.model)
m, u = model.load_state_dict(sd, strict=False)
if len(m) > 0 and verbose:
print("missing keys:")
print(m)
if len(u) > 0 and verbose:
print("unexpected keys:")
print(u)
model.eval()
return model
config = OmegaConf.load(f"configs/stable-diffusion/v1-inference.yaml")
model = load_model_from_config(config, f"your_local_sd_ckpt").to(torch.device("cuda:5"))
device = torch.device("cuda:5")
#get z
z_dev_list = []
z_exp_list = []
exp_dir = 'your_local_exp_dir'
id = 0
z_var_i = torch.load(f'{exp_dir}/z_var/{id}.pth')
z_exp_i = torch.load(f'{exp_dir}/z_exp/{id}.pth')
z_dev_i = torch.clamp(z_var_i,min=0)**0.5
z_dev_list.append(z_dev_i)
z_exp_list.append(z_exp_i)
def get_dev_x_from_z(dev,exp,N):
#get n samples from z distribution
z_list = []
for i in range(N):
z_list.append(
exp + torch.rand_like(exp) * dev
)
#### decode z into x
Z = torch.stack(z_list,dim = 0)
X = model.decode_first_stage(Z.to(device))
var_x = torch.var(X,dim = 0)
exp_x = torch.mean(X,dim=0)
dev_x = (var_x)**0.5
return dev_x
import os
os.makedirs(f'{exp_dir}/x_dev',exist_ok=True)
N = 15
for index in range(1):
z_dev = z_dev_list[index]
z_exp = z_exp_list[index]
dev_x = get_dev_x_from_z(z_dev,z_exp,N)
tvu.save_image(dev_x*100,f'{exp_dir}/x_dev/{id}.jpg' )
Hi! I have successfully created uncertainty maps for Stable Diffusion. However, the uncertainty maps I generated for DDIM_and_guided by visualizing the var do not align with the results in your paper. Could you kindly provide the visualization code for this? Thank you in advance.
Hi👋~ Thank you for your interest in our work! For CELEBA uncertainty visualization using DDIM sampler, you can try the python script in ./ddpm_and_guided/ddim_skipUQ_visualization.py
& this bash configuration:
DEVICES="5"
data="celeba"
steps="100"
mc_size="10"
sample_batch_size="16"
total_n_sample="16"
train_la_data_size="5000"
DIS="uniform"
fixed_class="10"
seed=123
CUDA_VISIBLE_DEVICES=$DEVICES python ddim_skipUQ_visualization.py \
--config $data".yml" --timesteps=$steps --skip_type=$DIS --train_la_batch_size 32 \
--mc_size=$mc_size --sample_batch_size=$sample_batch_size --fixed_class=$fixed_class --train_la_data_size=$train_la_data_size \
--total_n_sample=$total_n_sample --fixed_class=$fixed_class --seed=$seed
Thank you for your fast reply! However, when running the given visualization code on ImageNet instead of CELEBA (with the specifications of the last post as well as the standard specifications from the ddim.sh file), the generated uncertainty maps still don't make much sense. Do you have any pointers as to why this might be? Or does the visualization code for ImageNet differ from the visualization code of CELEBA? Thanks in advance!
Hi @cilevanmarken ~
For ImageNet visualization, as the size of dataset grows, you need to increase train_la_data_size
, which means using less data to fit the posterior distribution with the amount of #total_dataset_size/train_la_data_size
to get larger variance. For example, you will get the following results after changing train_la_data_size=500000
in the bash script above.
Hi! I would like to ask a question. If I change to a different dataset that only contains 200 images, what value should I set for train_la_data_size? Or would a small dataset like this lead to suboptimal results?
I tried to visualize the var part, but the output has no useful information. My modified /sd/dpmsolver_skipUQ.py code is as follows: