Closed OleNet closed 3 years ago
My implementation follows the definition of BPC in https://arxiv.org/pdf/1308.0850.pdf (page 8) and aligns with the implementation with Transformer-XL, and adaptive-span transformer and the StackOverflow thread you posted.
For the paper you mentioned, I think that's a typo in the paper. NLL has already applied a log
to the input: https://pytorch.org/docs/stable/generated/torch.nn.NLLLoss.html#torch.nn.NLLLoss.
OK, I got it , thanks
According to the material I have find from here and here, bpc=log2(NLL). But in the implementation in your code, I found that bpc = NLL / log2. Is there something wrong for the calculation of bpc, or I have missed anything?