huggingface / nanotron

Minimalistic large language model 3D-parallelism training
Apache License 2.0
1.14k stars 107 forks source link

Fixes : https://github.com/huggingface/nanotron/issues/114 #179

Closed MekkCyber closed 4 months ago

MekkCyber commented 4 months ago

Implementation of 1.58bit LLM with Llama following the paper & handbook released by Microsoft :

https://github.com/microsoft/unilm/blob/master/bitnet/The-Era-of-1-bit-LLMs__Training_Tips_Code_FAQ.pdf

cc @NouamaneTazi @xrsrke @thomwolf