The first preset table in the README is wrong

Thanks for flagging @phil71x! Note that the table is not wrong, but rather reflects the behaviour of the memory_extreme preset: memory_extreme will use all GPU memory available, and offload weights to the CPU if the entire model doesn't fit.

Since the GPU used to gather the numbers (A100 80GB) fits the 9b model entirely, all of the weights are on the GPU, and thus there is no CPU offload. In this case, memory_extreme collapses to memory. Only if the model weights didn't fit on the GPU would we see a difference here.

See this section for details: https://github.com/huggingface/local-gemma#preset-details

huggingface / local-gemma

The first preset table in the README is wrong #28