microsoft / BitNet

Official inference framework for 1-bit LLMs
MIT License
11.39k stars 769 forks source link

Larger models (70B, 405B) #8

Open do-me opened 1 month ago

do-me commented 1 month ago

Thanks a lot for open sourcing this amazing library! I was wondering whether you tried/are planning to prepare some larger models too, like Llama-3.1-70B/405B. As it seems, there is an actual chance to be able to run Llama-3.1-405B on a single Mac. Also, would you mind opening up the discussions section?

shumingma commented 1 month ago

Running a 405B model on a single PC is super cool! And BitNet can definitely help that.

Unfortunately, there is no plan for now due to the constrain of resources. Hopefully will make it in the near future.

We have opened up the discussion section. Enjoy!

dudutwizer commented 1 month ago

In the benchmark's graph, there is a 70b model. So, is it supported or not?

Deng-Xian-Sheng commented 1 month ago

Good! Please start it

I very happy

nonetrix commented 1 month ago

I would like a 120B model, I can run 120B models at 2 bits with normal methods but obviously losing a lot of quality but around same size as 70B model at 4 bits. In theory, this will be best of both worlds 120B at 70B size and FP16 120B quality right? :)

But what I am curious about is not that LLMs are using 1 bit or 2 bits? I am kinda confused what 1.58 bits is honestly???? How will we make them even smaller? Is it possible? We had quantization, but now it's already that way out of the box, so what is next?

Deng-Xian-Sheng commented 1 month ago

no  now model is not workd  If you good model, please push a library  very thanks