pytorch / torchtitan

A native PyTorch Library for large model training
BSD 3-Clause "New" or "Revised" License
1.28k stars 115 forks source link

[do NOT land][experiment] use local_map to annotate TritonFusedRMSNorm #363

Closed XilunWu closed 1 month ago

XilunWu commented 1 month ago

Stack from ghstack (oldest at bottom):

Test Plan unit test: torchrun --nproc_per_node=4 --rdzv_backend c10d --rdzv_endpoint="localhost:0" test_fused_rms_norm.py llama training test: CONFIG_FILE=./train_configs/debug_model.toml NGPU=4 LOG_RANK=0,1,2,3 ./run_llama_train.sh