tloen / llama-int8

Quantized inference code for LLaMA models
GNU General Public License v3.0
1.05k stars 105 forks source link

Can 65B run on 4*32G GPU? #11

Open zhongtao93 opened 1 year ago