I am trying to find out if I can run efficiently inference of small number of parameters LLM on CPU hardware. Does your extension to 4 bit work also on CPU hardware? It seems to me that 4bit is only mentioned in when device is cuda.
Thanks in advance.
Hello Vihang,
I am trying to find out if I can run efficiently inference of small number of parameters LLM on CPU hardware. Does your extension to 4 bit work also on CPU hardware? It seems to me that 4bit is only mentioned in when device is cuda. Thanks in advance.