Open do-me opened 1 month ago
Running a 405B model on a single PC is super cool! And BitNet can definitely help that.
Unfortunately, there is no plan for now due to the constrain of resources. Hopefully will make it in the near future.
We have opened up the discussion section. Enjoy!
In the benchmark's graph, there is a 70b model. So, is it supported or not?
Good! Please start it
I very happy
I would like a 120B model, I can run 120B models at 2 bits with normal methods but obviously losing a lot of quality but around same size as 70B model at 4 bits. In theory, this will be best of both worlds 120B at 70B size and FP16 120B quality right? :)
But what I am curious about is not that LLMs are using 1 bit or 2 bits? I am kinda confused what 1.58 bits is honestly???? How will we make them even smaller? Is it possible? We had quantization, but now it's already that way out of the box, so what is next?
no now model is not workd If you good model, please push a library very thanks
Thanks a lot for open sourcing this amazing library! I was wondering whether you tried/are planning to prepare some larger models too, like Llama-3.1-70B/405B. As it seems, there is an actual chance to be able to run Llama-3.1-405B on a single Mac. Also, would you mind opening up the discussions section?