pytorch / torchtitan

A native PyTorch Library for large model training
BSD 3-Clause "New" or "Revised" License
2.25k stars 165 forks source link

Change debugmodel to have 8 layers #403

Closed wconstab closed 3 months ago

wconstab commented 3 months ago

Stack from ghstack (oldest at bottom):


This is useful for PP when more layers == more possibilities for schedules/num_stages, but we don't care about having a large model in terms of #parameters