mgermain / MADE

MADE: Masked Autoencoder for Distribution Estimation
100 stars 20 forks source link

Loss function #4

Open dkirkby opened 5 years ago

dkirkby commented 5 years ago

I am having trouble understanding your loss function defined [here]() as:

pre_output = self.layers[-1].lin_output
log_prob = -T.sum(T.nnet.softplus(-target * pre_output + (1 - target) * pre_output), axis=1)
loss = (-log_prob).mean()

It looks like the softplus arg simplifies to 1 - 2 * target * pre_output, but does this form have better numerics? Why is the softplus used here?

How does this loss relate to eqn (5) of your paper, which looks like a standard binary cross entropy?