issues
search
tloen
/
llama-int8
Quantized inference code for LLaMA models
GNU General Public License v3.0
1.05k
stars
105
forks
source link
Is it possible to save the smaller weights so it doesn't have to convert them each time?
#10
Open
spullara
opened
1 year ago
spullara
commented
1 year ago
That would save startup time, wouldn't it?
That would save startup time, wouldn't it?