Open zhhangBian opened 4 days ago
Hi @zhhangBian, we originally included all gather to represent some parallel strategies that we were experimenting with. However, as of today we actually only use all reduce and send/recv operations -- which are sufficient to represent tensor and pipeline parallelism.
@AgrawalAmey Thank you for your response!
From my understanding, performing both a row-partition and a column-partition would involve using all-reduce, while performing only a row-partition or only a column-partition would require an all-gather operation.
However, I’m still curious. Could you kindly elaborate further on what you mean by “only use all-reduce and send/recv operations -- which are sufficient to represent tensor and pipeline parallelism”? Could you also explain the underlying principles and how this is implemented?
Thank you so much for your help!
Hello Vidur,
Thank you for sharing your work. While reading the code and documentation, I encountered some questions related to the Profiling Communication Operators mentioned in the paper.
In the paper, it is noted that there are three collective operations:
all_reduce
,all_gather
, andsend_recv
. However, in the simulated device data located atdata/compute
, it seems that simulation parameters are provided only forall_reduce
andsend_recv
. There are no simulation parameters for theall_gather
operation.After reviewing the relevant code in
vidur/profiling
, it appears thatall_gather
is treated as device-independent, and thus its parameters are not explicitly introduced. However, isn’tall_gather
typically device-dependent? If so, could you clarify why it is treated as device-independent in this case?Additionally, in
vidur/profiling/collectives/main.py
, the--collective
argument only supportschoices=["all_reduce", "send_recv"]
. Could you explain the rationale behind excludingall_gather
as an option here?The above are my points of confusion while going through the code. I would greatly appreciate it if you could provide clarification or corrections if I have misunderstood any part of your work.
Thank you in advance for your time and insights!