Open ReactiveCJ opened 6 years ago
At the line 166 of splitcross.py "entropy = -(head_entropy + tail_entropy)", we need not add head_entropy cause it have been added in logprob "results.append(head_entropy.view(-1, 1) + tail_entropy)"
I replaced this complicated SplitCrossEntropyLoss by the pytorch cross entropy loss which produces the same results and seems to be only slightly slower.
https://github.com/salesforce/awd-lstm-lm/blob/32fcb42562aeb5c7e6c9dec3f2a3baaaf68a5cb5/splitcross.py#L137
As the word in tail, the probability of this word is p(C) p(x=target|C), then the entropy is target log(p(C) p(x=target|C) = target log(P(C)) + target + log(p(x=target|C)。
We can just add the cross entropy on the head include tombstones, then compute cross entropy on each tail, so it is no need pass head_entropy below.