Open haocizhang opened 4 months ago
Enabled some cases to work where num_microbatches % pp_size != 0. Using the flex_pp schedule, we will have
num_microbatches % pp_size != 0
num_rounds = max(1, n_microbatches // pp_group_size) and it works as long as n_microbatches % num_rounds is 0. As a few examples, support
num_rounds = max(1, n_microbatches // pp_group_size)
n_microbatches % num_rounds is 0
Tested using the config in (1), schedule looks like the following graph:
n00b question, how do we assign received tensors to corresponding model chunk?
Enabled some cases to work where
num_microbatches % pp_size != 0
. Using the flex_pp schedule, we will havenum_rounds = max(1, n_microbatches // pp_group_size)
and it works as long asn_microbatches % num_rounds is 0
. As a few examples, supportTested using the config in (1), schedule looks like the following graph: