Closed luluxing3 closed 5 years ago
Hi @luluxing3
By definition, the variational lower bound is the expected log-likelihood minus the KL term. We can equivalently divide the whole term by the constant training_size
. This does not change the result and would help the optimizer converge better. Note that the self._logpred(...)
has been averaged out over the batch size.
I got it. Thanks for your explain.
In ddm/alg/cla_models_multihead.py, line 213
self.cost = tf.div(self._KL_term(), training_size) - self._logpred(self.x, self.y, self.task_idx)
why need divide training_size, which is 60,000 for permuted MINIST task.