Megabyte paper uses bits-per-byte in Table 2 as their evaluation metric. It seems it has difference compared with byte level perplexity, since their number in arXiv and Code is < 1. So it should not be perplexity. This repo uses the cross-entropy loss and can easily calculate the byte level perplexity. May I ask how to compute bits-per-byte metric?
Hi there,
Megabyte paper uses bits-per-byte in Table 2 as their evaluation metric. It seems it has difference compared with byte level perplexity, since their number in arXiv and Code is < 1. So it should not be perplexity. This repo uses the cross-entropy loss and can easily calculate the byte level perplexity. May I ask how to compute bits-per-byte metric?
Thanks a lot.