Closed wconstab closed 1 month ago
Nice. But it is less intuitive than I originally thought. Especially the int/str conversion part. Not sure if that's a best UX for pippy or a customized PipelineModuleList will be easier for users.
One downside to using ModuleDict
is that now the model print does not collapse TransformerBlock
s together, making the model print very long.
Stack from ghstack (oldest at bottom):
318
321
A few small changes here lets manual PP frontend 'reconfigure' a whole transformer model to a stage's portion simply by setting undesired layers to None (in cases of top level layers) or deleting them from the ModuleDict (for 'layers.*').
These changes don't impact the FQNs of the remaining layers, which is critical for checkpoint load/save compatibility.