huggingface / nanotron

Minimalistic large language model 3D-parallelism training
Apache License 2.0
946 stars 86 forks source link

FEAT: Support 1.58-bit LLMs training #114

Open younesbelkada opened 3 months ago

younesbelkada commented 3 months ago

Hi there!

image

Microsoft have just released the full handbook for reproduing the 1-bit LLM paper: https://github.com/microsoft/unilm/blob/master/bitnet/The-Era-of-1-bit-LLMs__Training_Tips_Code_FAQ.pdf

Would be exciting to see if we can have an official implementation of that paper in nanotron, and support 1-bit LLM inference directly in transformers for the models that have been trained with that method using nanotron

cc @NouamaneTazi @xrsrke @3outeille @thomwolf

cc original author: @shumingma

xrsrke commented 3 months ago

@younesbelkada, hey, thanks for the suggestion. I've talked with @NouamaneTazi; we agree that we will add support for 1bit later on for consumer hardware because FP8 is the coolest. You get a speedup in training (FP8 matmul, this is very important), memory reduction, and it's tested at scale (180B).... So, currently, we focus on FP8 :)