runs PP+DP and PP+TP without issue,
runs PP+TP+DP with decreasing loss, but fails DCP save
Supports only simple schedules currently, gpipe and 1f1b.
Ads cmdline/toml arg for specifiying split points, in a unified
way between tracer or manual frontend.
e.g. user can specifiy "layers.2,layers.4" as split points.
Currently uses manual frontend by default, but allows specifying
tracer frontend. Tracer frontend requires working around additional
compatibility limitations, indicated by raising assertions, and is
not ready for wider use yet.
@wconstab looks like CI is failing now, is it because the APIs for PP not in nightly yet? If so we should probably wait until the nightly is there and then reland this
Stack from ghstack (oldest at bottom):
runs PP+DP and PP+TP without issue, runs PP+TP+DP with decreasing loss, but fails DCP save
Supports only simple schedules currently, gpipe and 1f1b.
Ads cmdline/toml arg for specifiying split points, in a unified way between tracer or manual frontend.
e.g. user can specifiy "layers.2,layers.4" as split points.
Currently uses manual frontend by default, but allows specifying tracer frontend. Tracer frontend requires working around additional compatibility limitations, indicated by raising assertions, and is not ready for wider use yet.