pytorch / torchtitan

A native PyTorch Library for large model training
BSD 3-Clause "New" or "Revised" License
1.28k stars 115 forks source link

expose optimizer params, log optimizer type and settings for the run #365

Open lessw2020 opened 1 month ago

lessw2020 commented 1 month ago

This PR addresses the TODO in train.py regarding exposing the optimizer params to toml/cmd line configuration.

The three settings added are weight decay, beta1 and beta2. In addition, it now adds a single line of logging to show which optimizer and optimizer settings are being used for the run:

Screenshot 2024-05-26 at 8 46 37 AM