Open nict-wisdom opened 4 years ago
You can use this function for planning: https://github.com/msr-fiddle/pipedream/blob/pipedream_2bw/planner/planner.py#L33.
For the performance and memory cost functions, you might want to use direct measurements (from running a 100 or so iterations for the respective configuration).
Thank you for your very quick answer!
I was wondering how I can get values for some arguments: computation_time_per_block, num_parameters_per_block, num_activations_per_block, and output_activation_size.
More specifically,
I appreciate if you could answer these questions.
num_activations_per_block
is the size of the intermediate activations needed in a transformer block during training. output_activation_size
is the size of the intermediate activations sent between workers. Note that you can get these by profiling your model.
And yes, we're assuming that these are transformer models where the transformer blocks are repeated some number of times.
Thank you again for your kind support! I understand that PipeDream-2BW assumes uniform layers.
I have a related question about PipeDream (the former version). From my understanding from the paper, PipeDream can allocate different numbers of GPUs to stages (unlike PipeDream-2BW). My question is whether the implementation supports such allocations.
When I try, the optimizer (optimizer_graph_hierarchical.py
) actually produces such allocations.
However, the runtime is often blocked with such an allocation.
(One of the reasons is the gradient synchronization among processes in the same stage, but there must be some other reasons)
Moreover, I found the following comment:
TODO: don't current support uneven configurations.
Does the uneven configurations
mean allocating different numbers of GPUs to stages?
When I set a certain amount of GPUs (8/16/32) to train resnet, most of generated configurations are blocked soon after training starts. Could you tell me how we can solve it, or is it possible generate safe configurations?
You can use this function for planning: https://github.com/msr-fiddle/pipedream/blob/pipedream_2bw/planner/planner.py#L33.
For the performance and memory cost functions, you might want to use direct measurements (from running a 100 or so iterations for the respective configuration).
May I know if this is still available for non-commercial usage now?
In pipedream_2bw branch, we found the runtime that implements PipeDream-2BW. However, no explanation is given about the planner.
Can we use the planner?