pytorch / torchtitan

A native PyTorch Library for large model training
BSD 3-Clause "New" or "Revised" License
2.25k stars 165 forks source link

Enable CP #433

Open fegin opened 3 months ago

fegin commented 3 months ago

Stack from ghstack (oldest at bottom):

This PR adds experimental flags and functions to enable context parallelism. We currently support on ly FSDP + CP and CP only. CP + TP is being tested.