pytorch / torchtitan

A native PyTorch Library for large model training
BSD 3-Clause "New" or "Revised" License
2.25k stars 165 forks source link

Synced estimate.py with train.py #424

Closed sanketpurandare closed 3 months ago

sanketpurandare commented 3 months ago

Stack from ghstack (oldest at bottom):

sanketpurandare commented 3 months ago

The change itself seems good to me. I wonder what the approach will be in the future if train.py continues to change though.

  1. I think the best way would be to incorporate this into train.py directly and use the estimate config options to enable and disable the right parts of the code in the main workflow. That way we don't have to maintain two copies.
  2. On another note what @gnadathur suggested and seems pretty reasonable is, we want the estimate.py to evolve into an option that auto configures stuff and outputs a configuration to run.
  3. For now I replicated some things because I haven't got user feedback for what they want. @lessw2020 is going to advertise this tool to partner teams, who may give us feedback about how they want to use it.

I am open to other suggestions as well.

cc: @awgu @tianyu-l