Fixes : https://github.com/huggingface/nanotron/issues/114

huggingface / nanotron

Minimalistic large language model 3D-parallelism training

Apache License 2.0

1.23k stars 122 forks source link

Closed MekkCyber closed 5 months ago

MekkCyber commented 5 months ago

Implementation of 1.58bit LLM with Llama following the paper & handbook released by Microsoft :

cc @NouamaneTazi @xrsrke @thomwolf