meta-llama / llama3

The official Meta Llama 3 GitHub site
Other
25.72k stars 2.86k forks source link

can llama3 depolyed in T4? #95

Open ucsdzehualiu opened 4 months ago

ucsdzehualiu commented 4 months ago

as titled,what is the minimun hardware requirement for 8b and 70b

AvisP commented 4 months ago

If you are loading the 8B model in bfloat16 then it won't fit in 1 GPU as it took 15.5 GB of memory. The quantized versions 8bit/4bit can fit in 1 T4

Here is the memory requirement for bfloat16 in kaggle notebook

Screenshot 2024-04-23 at 12 19 28 PM

And here is the requirement for 4 bit quantized model using nf4.

Screenshot 2024-04-23 at 2 36 32 PM
LouisChen15 commented 3 months ago

any idea how to load the quantized version?

AvisP commented 3 months ago

Check out this notebook You can run it on kaggle setting to T4 GPUs