What's the meaning of "For fixedlarge, we set initial (log-)variance like os to get a better decoder log likelihood"?

Hi, I notice when you define the model_variance here, first item of variance is set as posterior_variance[1].

The comment here explains that " we set initial (log-)variance like so to get a better decoder log likelihood". But I am still confused.

I want to know:

the motivation for replacing beta[0] as posterior_variance[1]
why replacing beta[0] as posterior_variance[1] can get a better decoder log-likelihood.
I find that compared with beta[0], posterior_variance[1] usually has a smaller value, which may result in a smaller log_probs. So I want to know what's the accurate definition of "a better log likelihood".

Could you give a more detailed explanation? I am looking forward to your reply. Thanks.

openai / guided-diffusion