Hi @sanchit-gandhi
Was wondering if you could explain the standard partitioning rules? In particular, how are activation and parameter parallelism achieved through the various logical_axis_rules combinations of activation_dims and parameter_dims.
I've read the t5x partitioning documentation and the section on canonical axis rules but am still confused.
Hi @sanchit-gandhi Was wondering if you could explain the standard partitioning rules? In particular, how are
activation
andparameter
parallelism achieved through the variouslogical_axis_rules
combinations ofactivation_dims
andparameter_dims
.I've read the t5x partitioning documentation and the section on canonical axis rules but am still confused.