huggingface / local-gemma

Gemma 2 optimized for your local machine.
Apache License 2.0
321 stars 28 forks source link

The first preset table in the README is wrong #28

Open phil71x opened 2 months ago

phil71x commented 2 months ago

The 3rd line is a duplicate of the 2nd line

sanchit-gandhi commented 1 month ago

Thanks for flagging @phil71x! Note that the table is not wrong, but rather reflects the behaviour of the memory_extreme preset: memory_extreme will use all GPU memory available, and offload weights to the CPU if the entire model doesn't fit.

Since the GPU used to gather the numbers (A100 80GB) fits the 9b model entirely, all of the weights are on the GPU, and thus there is no CPU offload. In this case, memory_extreme collapses to memory. Only if the model weights didn't fit on the GPU would we see a difference here.

See this section for details: https://github.com/huggingface/local-gemma#preset-details