Hi, I notice when you define the model_variancehere, first item of variance is set as posterior_variance[1].
The comment here explains that " we set initial (log-)variance like so to get a better decoder log likelihood". But I am still confused.
I want to know:
the motivation for replacing beta[0] as posterior_variance[1]
why replacing beta[0] as posterior_variance[1] can get a better decoder log-likelihood.
I find that compared with beta[0], posterior_variance[1] usually has a smaller value, which may result in a smaller log_probs. So I want to know what's the accurate definition of "a better log likelihood".
Could you give a more detailed explanation?
I am looking forward to your reply. Thanks.
Hi, I notice when you define the model_variance here, first item of variance is set as posterior_variance[1].
The comment here explains that " we set initial (log-)variance like so to get a better decoder log likelihood". But I am still confused.
I want to know:
Could you give a more detailed explanation? I am looking forward to your reply. Thanks.