This is in reference to this function in runtime.py
def num_iterations(self, loader_size):
""" Determines number of iterations for this stage
TODO: don't currently support uneven configurations.
"""
if self.stage == 0 or self.stage is None:
return loader_size
num_iterations = loader_size * self.num_ranks_in_first_stage
assert num_iterations % self.num_ranks_in_stage == 0
num_iterations = num_iterations // self.num_ranks_in_stage
return num_iterations
From my understanding, the total number of batches in the dataset should be a multiple of the layer replication factor for all layers except the first one for this function to not throw an assertion error. However, there is no guarantee that the optimizer module of pipedream will assign replication factors so that they follow this constraint as well. As a result, sometimes the framework is unable to execute training because of this limitation.
This is in reference to this function in runtime.py
From my understanding, the total number of batches in the dataset should be a multiple of the layer replication factor for all layers except the first one for this function to not throw an assertion error. However, there is no guarantee that the optimizer module of pipedream will assign replication factors so that they follow this constraint as well. As a result, sometimes the framework is unable to execute training because of this limitation.