can llama3 depolyed in T4?

meta-llama / llama3

The official Meta Llama 3 GitHub site

Other

25.72k stars 2.86k forks source link

Open ucsdzehualiu opened 4 months ago

ucsdzehualiu commented 4 months ago

as titled,what is the minimun hardware requirement for 8b and 70b

AvisP commented 4 months ago

If you are loading the 8B model in bfloat16 then it won't fit in 1 GPU as it took 15.5 GB of memory. The quantized versions 8bit/4bit can fit in 1 T4

Here is the memory requirement for bfloat16 in kaggle notebook

And here is the requirement for 4 bit quantized model using nf4.

LouisChen15 commented 3 months ago

any idea how to load the quantized version?

AvisP commented 3 months ago

Check out this notebook You can run it on kaggle setting to T4 GPUs