msr-fiddle / pipedream

MIT License
379 stars 117 forks source link

How to determine replication factors #37

Open ADAM-CT opened 4 years ago

ADAM-CT commented 4 years ago

I run the following code python optimizer_graph_hierarchical.py -f ../profiler/image_classification/profiles/vgg16/graph.txt -n 8 2 --activation_compression_ratio 1 -o vgg16_partitioned -b 4294967296 2147483648

i get the output.Can you tell me in this example: stage and replication factor? How do they use the following output calculations?

Total number of states: 40 Solving optimization problem with 8 machines with inter-machine bandwidth of 4.29 GB/s [[0.04692 0.056919000000000004 0.21645 ... 0.6715939999999999 0.6715939999999999 0.6725349999999999] [None 0.009999000000000001 0.16953000000000001 ... 0.624674 0.624674 0.6256149999999999] [None None 0.159531 ... 0.6146749999999999 0.6146749999999999 0.6156159999999998] ... [None None None ... None 0.0 0.0009409999999999696] [None None None ... None None 0.0009409999999999696] [None None None ... None None None]] Solving optimization problem with 2 machines with inter-machine bandwidth of 2.15 GB/s [[0.005865730156898499 0.007115605156898499 0.027072026604413987 ... 0.09235717243739539 0.09235717243739539 0.09235717243739539] [None 0.0012498750000000001 0.02120629644751549 ... 0.0856534978594099 0.0856534978594099 0.0856534978594099] [None None 0.01995642144751549 ... 0.08422506928798132 0.08422506928798132 0.08422506928798132] ... [None None None ... None 0.0 0.0009765625] [None None None ... None None 0.001786962507337328] [None None None ... None None None]] [[0.002933282310962677 0.0035582198109626773 0.013545028504729271 ... 0.07743855509553638 0.07743855509553638 0.07839246224258628] [None 0.0006249375000000001 0.010611746193766595 ... 0.0740863005740302 0.0740863005740302 0.07504020772108011] [None None 0.009986808693766594 ... 0.07337208628831592 0.07337208628831592 0.07432599343536582] ... [None None None ... None 0.0 0.0014421883970499039] [None None None ... None None 0.0018473884007185679] [None None None ... None None None]]

Level 2 Number of machines used: 2... Compute time = 0.335797, Data-parallel communication time = 0.250080...

Number of machines in budget not used: 0...

(Split start, split end) / compute time taken per stage / replication factor per stage: (0, 40) 0.6725349999999999 2 Total number of stages: 1 Level 1 Number of machines used: 1... Split between layers 23 and 24... Split before antichain ['node26']... Compute time = 0.049474, Data-parallel communication time = 0.000000, Pipeline-parallel communication time = 0.023926... Number of machines used: 7... Compute time = 0.088874, Data-parallel communication time = 0.003483... Number of machines in budget not used: 0... (Split start, split end) / compute time taken per stage / replication factor per stage: (0, 24) 0.6221200000000001 7 (24, 40) 0.050414999999999766 1

Total number of stages: 2 Time taken by single-stage pipeline: 0.6725349999999999 Time per stage in pipeline: 0.07839246224258628 Throughput increase (compared to single machine): 8.579077385257188 [Note that single-machine and (8,2)-machine DP might not fit given memory constraints] Throughput increase of (8,2)-machine DP compared to single machine: 6.5655154772476045 Throughput increase (compared to (8,2)-machine DP): 1.3066875578905357

ADAM-CT commented 4 years ago

my environment: Server1:8 V100 Server2: 8 V100 model: Vgg16

ADAM-CT commented 4 years ago

I think the replication factor of stage 0 is 7, and the replication factor of stage 1 is 1. But only 8 GPUs are used in this way. I don't know if my understanding is correct

deepakn94 commented 4 years ago

So the configuration here is a (7, 1) configuration replicated twice. In other words, ranks 0-6 and 8-14 run stage 1, and ranks 7+15 run stage 2.

You need to run the configurations from Level 1 to Level n (the number of levels correspond to the depth of your network hierarchy -- 2 in your example).

ADAM-CT commented 4 years ago

I think I understand what you mean, but I don't know what to do next. I still don't know how to pass parameters(--stage_to_num_ranks) and how to use convert.py to generate models.

python convert_graph_to_model.py -f vgg16_partitioned/gpus=16.txt -n VGG16Partitioned -a vgg16 -o ../runtime/image_classification/models/vgg16/gpus=16 --stage_to_num_ranks 0:14,1:2