Closed WenzhengZhang closed 3 years ago
Thanks for asking. MML is supposed to calculate and computes the cross entropy loss based on it. In the implementation, loss_tensor
contains the cross entropy loss for every y
that is equivalent to So we put -
and then exp
to re-calculate P(y)
, sum over the answer candidates, and then apply log
and -
again to convert it back to the valid cross entropy loss.
This doesn't explain - 1e10 * (loss_tensor==0).float()
yet. The reason for this term is to remove the impact of dummy answers which losses are 0. (See this line for how it was set to zero.) If we simply use exp
, these zeros will be converted to ones. So we want to give a large negative value to these before applying exp
so that they will be zeros once exp
is applied.
Thanks for asking. MML is supposed to calculate and computes the cross entropy loss based on it. In the implementation,
loss_tensor
contains the cross entropy loss for everyy
that is equivalent to So we put-
and thenexp
to re-calculateP(y)
, sum over the answer candidates, and then applylog
and-
again to convert it back to the valid cross entropy loss.This doesn't explain
- 1e10 * (loss_tensor==0).float()
yet. The reason for this term is to remove the impact of dummy answers which losses are 0. (See this line for how it was set to zero.) If we simply useexp
, these zeros will be converted to ones. So we want to give a large negative value to these before applyingexp
so that they will be zeros onceexp
is applied.
Thanks a lot for your reply! It's very helpful for me.
Hi, Thanks for sharing your work! I'm a little confused about the _take_mml function. Could you tell me the reason why you add
- 1e10 * (loss_tensor==0).float()
here? Thanks!