pytorch / torchtitan

A native PyTorch Library for large model training
BSD 3-Clause "New" or "Revised" License
2.66k stars 206 forks source link

recommended practices for loss converging tests #695

Closed tianyu-l closed 2 hours ago

tianyu-l commented 2 days ago

Stack from ghstack (oldest at bottom):

The actual converging experiments will be updated later.