pytorch / torchtitan

A native PyTorch Library for large model training
BSD 3-Clause "New" or "Revised" License
1.28k stars 115 forks source link

Fix 1D PP tracer test #362

Closed wconstab closed 3 weeks ago

wconstab commented 1 month ago

Stack from ghstack (oldest at bottom):

forgot to enable tracer for tracer test in the last PR

kwen2501 commented 1 month ago

CI should pass after https://github.com/pytorch/pytorch/pull/127607 is landed.

wconstab commented 3 weeks ago

Made a small change: In tracer mode, we don't require users to provide manual split points, because that is being taken care of by the pipeline_llama_tracer function today:

I think this is not the right way to do this. If we want the layer-split to be 'automatic', i think it should be automatic for both frontends, and we can delete the _split_points cmdline arg.

or if we want to have the cmdline arg, we should keep it behaving the same for both frontends.

I'd propose to first keep the arg for both frontends, and then do a PR that makes the cmdline arg optional and uses automation for both PRs.

kwen2501 commented 3 weeks ago

Sounds good to me