Closed sanketpurandare closed 3 months ago
Previous conversation:
The change itself seems good to me. I wonder what the approach will be in the future if
train.py
continues to change though.
train.py
directly and use the estimate config options to enable and disable the right parts of the code in the main workflow. That way we don't have to maintain two copies.estimate.py
to evolve into an option that auto configures stuff and outputs a configuration to run.I am open to other suggestions as well.
cc: @awgu @tianyu-l
Originally posted by @sanketpurandare in https://github.com/pytorch/torchtitan/issues/424#issuecomment-2189676294
Stack from ghstack (oldest at bottom):
-> #425
pytorch/pytorch#125323
Command ( Disabling fake_mode_only will do an actual single GPU run with fake process group):
./run_memory_estimation.sh --memory_estimation.enabled --memory_estimation.disable_fake_mode
Output: