Open dkirkby opened 5 years ago
I am having trouble understanding your loss function defined [here]() as:
pre_output = self.layers[-1].lin_output log_prob = -T.sum(T.nnet.softplus(-target * pre_output + (1 - target) * pre_output), axis=1) loss = (-log_prob).mean()
It looks like the softplus arg simplifies to 1 - 2 * target * pre_output, but does this form have better numerics? Why is the softplus used here?
1 - 2 * target * pre_output
softplus
How does this loss relate to eqn (5) of your paper, which looks like a standard binary cross entropy?
I am having trouble understanding your loss function defined [here]() as:
It looks like the softplus arg simplifies to
1 - 2 * target * pre_output
, but does this form have better numerics? Why is thesoftplus
used here?How does this loss relate to eqn (5) of your paper, which looks like a standard binary cross entropy?