Open younesbelkada opened 3 months ago
@younesbelkada, hey, thanks for the suggestion. I've talked with @NouamaneTazi; we agree that we will add support for 1bit later on for consumer hardware because FP8 is the coolest. You get a speedup in training (FP8 matmul, this is very important), memory reduction, and it's tested at scale (180B).... So, currently, we focus on FP8 :)
Hi there!
Microsoft have just released the full handbook for reproduing the 1-bit LLM paper: https://github.com/microsoft/unilm/blob/master/bitnet/The-Era-of-1-bit-LLMs__Training_Tips_Code_FAQ.pdf
Would be exciting to see if we can have an official implementation of that paper in nanotron, and support 1-bit LLM inference directly in transformers for the models that have been trained with that method using nanotron
cc @NouamaneTazi @xrsrke @3outeille @thomwolf
cc original author: @shumingma