Questions about _take_mml function

WenzhengZhang commented 3 years ago

Hi, Thanks for sharing your work! I'm a little confused about the _take_mml function. Could you tell me the reason why you add - 1e10 * (loss_tensor==0).float() here? Thanks!

shmsw25 commented 3 years ago

Thanks for asking. MML is supposed to calculate $-log\sum_{y}P(y)$ and computes the cross entropy loss based on it. In the implementation, loss_tensor contains the cross entropy loss for every y that is equivalent to $-logP(y)$ So we put - and then exp to re-calculate P(y), sum over the answer candidates, and then apply log and - again to convert it back to the valid cross entropy loss.

This doesn't explain - 1e10 * (loss_tensor==0).float() yet. The reason for this term is to remove the impact of dummy answers which losses are 0. (See this line for how it was set to zero.) If we simply use exp, these zeros will be converted to ones. So we want to give a large negative value to these before applying exp so that they will be zeros once exp is applied.

WenzhengZhang commented 3 years ago

Thanks for asking. MML is supposed to calculate $-log\sum_{y}P(y)$ and computes the cross entropy loss based on it. In the implementation, loss_tensor contains the cross entropy loss for every y that is equivalent to So we put - and then exp to re-calculate P(y), sum over the answer candidates, and then apply log and - again to convert it back to the valid cross entropy loss.

This doesn't explain - 1e10 * (loss_tensor==0).float() yet. The reason for this term is to remove the impact of dummy answers which losses are 0. (See this line for how it was set to zero.) If we simply use exp, these zeros will be converted to ones. So we want to give a large negative value to these before applying exp so that they will be zeros once exp is applied.

Thanks a lot for your reply! It's very helpful for me.

shmsw25 / AmbigQA

Questions about _take_mml function #19