lucidrains / MEGABYTE-pytorch

Implementation of MEGABYTE, Predicting Million-byte Sequences with Multiscale Transformers, in Pytorch
MIT License
624 stars 52 forks source link

Evaluation metric bits-per-byte #14

Open jxiw opened 1 year ago

jxiw commented 1 year ago

Hi there,

Megabyte paper uses bits-per-byte in Table 2 as their evaluation metric. It seems it has difference compared with byte level perplexity, since their number in arXiv and Code is < 1. So it should not be perplexity. This repo uses the cross-entropy loss and can easily calculate the byte level perplexity. May I ask how to compute bits-per-byte metric?

Thanks a lot.

eegli commented 2 months ago

BPB can be computed from cross entropy loss (which is the model output) as:


def cc_to_bpb(cc_loss: float):
    return cc_loss * math.log2(math.e)