Figure 1 clarification on batch size and sequence length

In Figure 1, what is batch size, sequence len, and vocab size here? It isn't clear from the caption. I would expect activations to take up more space. From what I can tell:

batch size seems to be 256 based on Fig. 1 caption
sequence len seems to be 2048, based on footnote 1
vocab size is 32000, based on config from repo
bf16 used so 2 bytes per float, based on footnote 2

So only the logits of the Llama model should take up 256 * 2048 * 32000 * 2 bytes or 31.25 GB. Where is this required memory in Figure 1?

Thanks!

jiaweizzhao / GaLore

Figure 1 clarification on batch size and sequence length #57