hatchetProject / QuEST

QuEST: Efficient Finetuning for Low-bit Diffusion Models
26 stars 2 forks source link

Reason for eta=0.0 in the imagenet script? #13

Closed adilhasan927 closed 1 week ago

adilhasan927 commented 1 week ago

In sample_diffusion_ldm_imagenet.py:

@torch.no_grad()
def convsample_ddim(model, shape, class_label, eta=1.0):
    ddim = DDIMSampler(model)

    n_samples_per_class = shape[0]
    ddim_steps = 20
    ddim_eta = 0.0
    scale = 3.0

    with model.ema_scope():
        uc = model.get_learned_conditioning(
            {model.cond_stage_key: torch.tensor(n_samples_per_class*[1000]).to(model.device)}
            )
        print(f"rendering {n_samples_per_class} examples of class '{class_label}' in {ddim_steps} steps and using s={scale:.2f}.")
        xc = torch.tensor(n_samples_per_class*[class_label])
        c = model.get_learned_conditioning({model.cond_stage_key: xc.to(model.device)})

        samples_ddim, _ = ddim.sample(S=ddim_steps,
                                        conditioning=c,
                                        batch_size=n_samples_per_class,
                                        shape=[3, 64, 64],
                                        verbose=False,
                                        unconditional_guidance_scale=scale,
                                        unconditional_conditioning=uc, 
                                        eta=ddim_eta)

    return samples_ddim, _

why do we set ddim_eta=0.0 and ignore the eta=1.0 supplied in the arguments?

especially since, the recommended command line setting for imagenet LDM is:

python sample_diffusion_ldm_imagenet.py -r models/ldm/cin256-v2/model.ckpt -n 50 --batch_size 50 -c 20 -e 1.0  --seed 40 --ptq  --weight_bit <4 or 8> --quant_mode qdiff --cali_st 20 --cali_batch_size 32 --cali_n 256 --quant_act --act_bit <4 or 8> --a_sym --a_min_max --running_stat --cond --cali_data_path <cali_data_path> -l <output_path>

where we can note -e 1.0 -> eta=1.0.

Does this affect the FID expected for the generated images?

hatchetProject commented 1 week ago

Hi, we follow the parameter setting as in latent diffusion. eta=1.0 is something that the open-sourced LDM-ImageNet used in training, while eta=0.0 is the setting they used for sampling. Under our case, you can ignore the -e argument for ImageNet. Sorry for the confusion.

Different parameters can lead to different FID, but I can't guarantee how great the impact is.

adilhasan927 commented 1 week ago

Ah ok, thank you, that makes sense.