newbie question about 4bit quantization

vihangd / alpaca-qlora

Instruct-tune Open LLaMA / RedPajama / StableLM models on consumer hardware using QLoRA

Apache License 2.0

80 stars 11 forks source link

newbie question about 4bit quantization #7

Open andreapago opened 1 year ago

andreapago commented 1 year ago

Hello Vihang,

I am trying to find out if I can run efficiently inference of small number of parameters LLM on CPU hardware. Does your extension to 4 bit work also on CPU hardware? It seems to me that 4bit is only mentioned in when device is cuda. Thanks in advance.