huggingface / nanotron

Minimalistic large language model 3D-parallelism training
Apache License 2.0
1.14k stars 107 forks source link

[LARGE] Bring all recent updates from brrr – reducing dependencies #7

Closed thomwolf closed 9 months ago

thomwolf commented 9 months ago

Goals:

thomwolf commented 9 months ago

Ok it now all run for the training. Example can be run by just doing bash ./examples/train_tiny_llama.sh

Maybe we can merge to use it as a base for the other PRs.

Cc @NouamaneTazi quite hard to review unfortunately sorry

thomwolf commented 9 months ago

Merging for now and we can iterate from main for further tweaks and fixes