yzh119 / BPT

Source code of paper "BP-Transformer: Modelling Long-Range Context via Binary Partitioning"
MIT License
125 stars 20 forks source link

I think there is a bug in the implementation of bpc #2

Closed OleNet closed 3 years ago

OleNet commented 3 years ago

According to the material I have find from here and here, bpc=log2(NLL). But in the implementation in your code, I found that bpc = NLL / log2. Is there something wrong for the calculation of bpc, or I have missed anything?

yzh119 commented 3 years ago

My implementation follows the definition of BPC in https://arxiv.org/pdf/1308.0850.pdf (page 8) and aligns with the implementation with Transformer-XL, and adaptive-span transformer and the StackOverflow thread you posted.

For the paper you mentioned, I think that's a typo in the paper. NLL has already applied a log to the input: https://pytorch.org/docs/stable/generated/torch.nn.NLLLoss.html#torch.nn.NLLLoss.

OleNet commented 3 years ago

OK, I got it , thanks