huggingface / nanotron

Minimalistic large language model 3D-parallelism training
Apache License 2.0
1.23k stars 122 forks source link

Fixes : https://github.com/huggingface/nanotron/issues/114 #178

Closed MekkCyber closed 5 months ago

MekkCyber commented 5 months ago

Implementation of 1.58bit LLM with Llama following the paper & handbook released by Microsoft :

https://github.com/microsoft/unilm/blob/master/bitnet/The-Era-of-1-bit-LLMs__Training_Tips_Code_FAQ.pdf

cc @NouamaneTazi @xrsrke @thomwolf