openai / improved-diffusion

Release for Improved Denoising Diffusion Probabilistic Models
MIT License
3.13k stars 476 forks source link

How to understand the loss(loss_q0,loss_q1,loss_q2)? #71

Open shenxiaochenn opened 1 year ago

shenxiaochenn commented 1 year ago

| grad_norm | 0.0913 | | loss | 0.0621 | | loss_q0 | 0.17 | | loss_q1 | 0.0455 | | loss_q2 | 0.0209 | | loss_q3 | 0.00557 | | mse | 0.0583 | | mse_q0 | 0.156 | | mse_q1 | 0.0453 | | mse_q2 | 0.0208 | | mse_q3 | 0.00549 | | samples | 2.85e+06 | | step | 1.11e+04 | | vb | 0.00376 | | vb_q0 | 0.0143 | | vb_q1 | 0.000231 | | vb_q2 | 0.000103 | | vb_q3 | 7.26e-05 |

Hi, I got some output with your code, but I can't figure out what q_0,q_1,q_2,q_3 means here. Thanks~~

chensming commented 1 year ago

hi bros, have you understood these? I am confused, too.

shenxiaochenn commented 1 year ago

hi bros, have you understood these? I am confused, too.

I must say. I don`t know

aobusi commented 1 year ago

现在知道了吗大胸弟

theneao commented 1 year ago
    for sub_t, sub_loss in zip(ts.cpu().numpy(), values.detach().cpu().numpy()):
        quartile = int(4 * sub_t / diffusion.num_timesteps)
        logger.logkv_mean(f"{key}_q{quartile}", sub_loss)
     就是反向推理恢复原图时从0到T步,中间抽了几次计算损失
aobusi commented 1 year ago

谢谢

Stamatis8 commented 3 months ago

I figured it out after a while: so, in diffusion, the loss is calculated via the sum of many loss sub-terms (see the variational lower bound formulation in the relevant paper), each term corresponding to one denoising (diffusion) step of the diffusion process. Denote the loss term corresponding to the $i^{\rm th}$ diffusion step by $Li$. Further, say that the specific diffusion instance is done in $n{\rm timesteps}$ steps and that the logger logs every $n{\rm training}$ training steps. Then, denoting via $L{i,j}$ the $i^{\rm th}$ loss term corresponding to $j^{\rm th}$ training step between logging intervals, the loss_qi terms reported to the user are calculated as follows,

Finally, the loss_qi term we see in the log file is the average over all $L_{i,j}$ in its respective set.

In other words, each loss_qi is a measure of how well the $i^{\rm th}$ quartile of the diffusion process (i.e. the $i^{\rm th}$ quartile of denoising steps) is performing.