Closed hangg7 closed 5 years ago
Hi,
About computing p(x|T), the technique currently listed in the paper is wrong:
It's clear that you are trying to normalize all scores in a batch using softmax where score function is the sum over masked activations, i.e.,
however the trace operation is not correct, since it would end in
Please consider the modification - it might be a minor mistake however resulting in unnecessary confusion.
Hi @cullengao , I also find this typo.
Hi,
About computing p(x|T), the technique currently listed in the paper is wrong:
It's clear that you are trying to normalize all scores in a batch using softmax where score function is the sum over masked activations, i.e.,
however the trace operation is not correct, since it would end in
Please consider the modification - it might be a minor mistake however resulting in unnecessary confusion.