tloen / llama-int8

Quantized inference code for LLaMA models
GNU General Public License v3.0
1.05k stars 105 forks source link

Is it possible to save the smaller weights so it doesn't have to convert them each time? #10

Open spullara opened 1 year ago

spullara commented 1 year ago

That would save startup time, wouldn't it?