Open cabinz opened 1 year ago
Seems to be a bug with AllReduce. In fact, I tried AllGather and it solves it pretty quickly:
msccl solve instance Ring Allgather -s 2 -n 5
Thanks for reporting this!
@olsaarik can you please take a look?
Hello,
I want to share some insights into this issue.
The functions solve_least_steps
and solve_all_latency_bandwidth_tradeoffs
do not support CNR algorithms like Allreduce, as mentioned in SYNTHESIS.md. But somehow it seems to work fine with ring topologies of 4 nodes or less.
Further details, when Allreduce is given to solve_least_steps, ReductionNotApplicableError raises while creating the PathEncoding object. (precondition and postcondition) of combining algorithms are converted to the condition of the corresponding non-combining algorithm, but in the case of Allreduce, the condition is not converted because of this ReductionNotApplicableError. As a result, the problem is encoded with the same condition as AllGather's, except for added constraints for Allreduce. Because of these added constraints, the problem becomes infeasible (and keeps the function running forever) for ring topology with 5 nodes.
I think it it would be better to raise an exception when Allreduce is given into solve_least_steps
or solve_all_latency_bandwidth_tradeoffs
.
Problem
The following code cannot solve a least-step algo of allreduce on the ring topology:
The program seems to iterate forever.
Description
I found that
solve_least_step()
work well with topologygeneric.ring(4)
and collectiveallreduce()
, when I am using the following code:Terminal log is as below:
However, if I run with
ring(5)
with the code snippet provided in the Problem section, no valid algo can be synthesized in acceptable time, the program will keep running with the terminal logging out as below:To the best of my knowledge, an AllReduce operator always has a reduce-broadcast implementation (may be not the optimal one). And the code below works properly:
with output:
indicating that there at least a 4-step algo for allreduce on this
ring(5)
topology.Could you please kindly tell me how to synthsize a least-step allreduce algo for ring topology with 5 and more nodes?