Fixes : https://github.com/huggingface/nanotron/issues/114

huggingface / nanotron

Minimalistic large language model 3D-parallelism training

Apache License 2.0

1.14k stars 107 forks source link

Closed MekkCyber closed 4 months ago

MekkCyber commented 4 months ago

Implementation of 1.58bit LLM with Llama following the paper & handbook released by Microsoft :

cc @NouamaneTazi @xrsrke @thomwolf