splits cross entropy can be further optimized

salesforce / awd-lstm-lm

LSTM and QRNN Language Model Toolkit for PyTorch

BSD 3-Clause "New" or "Revised" License

1.96k stars 488 forks source link

splits cross entropy can be further optimized #62

Open ReactiveCJ opened 6 years ago

ReactiveCJ commented 6 years ago

https://github.com/salesforce/awd-lstm-lm/blob/32fcb42562aeb5c7e6c9dec3f2a3baaaf68a5cb5/splitcross.py#L137

As the word in tail, the probability of this word is p(C) p(x=target|C), then the entropy is target log(p(C) p(x=target|C) = target log(P(C)) + target + log(p(x=target|C)。

We can just add the cross entropy on the head include tombstones, then compute cross entropy on each tail, so it is no need pass head_entropy below.

ReactiveCJ commented 6 years ago

At the line 166 of splitcross.py "entropy = -(head_entropy + tail_entropy)", we need not add head_entropy cause it have been added in logprob "results.append(head_entropy.view(-1, 1) + tail_entropy)"

octavian-ganea commented 6 years ago

I replaced this complicated SplitCrossEntropyLoss by the pytorch cross entropy loss which produces the same results and seems to be only slightly slower.