huggingface / nanotron

Minimalistic large language model 3D-parallelism training
Apache License 2.0
1.23k stars 122 forks source link

Remove Apex dependency #12

Closed NouamaneTazi closed 10 months ago

NouamaneTazi commented 10 months ago

Tested this works by running:

USE_FAST=1 CUDA_DEVICE_MAX_CONNECTIONS=1 torchrun --rdzv-backend=c10d --nproc_per_node=8 run_train.py --config-file examples/config_tiny_llama.yaml
3outeille commented 10 months ago

look goods to me