Support for quantized llm for smaller memory devices

pytorch / torchchat

Run PyTorch LLMs locally on servers, desktop and mobile

BSD 3-Clause "New" or "Revised" License

3.34k stars 213 forks source link

Support for quantized llm for smaller memory devices #1024

Closed jhetuts closed 1 month ago

jhetuts commented 2 months ago

🚀 The feature, motivation and pitch

I believe, this is what ollama's one huge advantage. This can also encourage devs to go test llm which they can run on their machine's capabilities. Like me, I have my m1 16gb, I cannot really enjoy testing the meta llms, especially the llama3.1

Alternatives

No response

Additional context

No response

RFC (Optional)

No response

larryliu0820 commented 2 months ago

Hi @jhetuts thanks for providing feedback! Can you give a specific example on what quantized llm you are referring to? For llama3.1 we have a few quantization options and you can refer to this readme. m1 16gb should be able to run llama3.1 8B model with quantization.

jhetuts commented 1 month ago

Got it @larryliu0820 , I've been testing with this. Thanks!