TACCL synthesizes collectives for up-to 80-GPUs in less than 3 minutes, at least two orders of magnitude faster than other synthesis-based state-of-the-art collective communication libraries.
To understand synthesis time, we also synthesized an ALLGATHER for 80 GPUs(10 nodes) in 8 minutes.
It takes about 41s to generate ALLGATHER and 125s to generate ALLTOALL for 32 GPUs on a DGX-2 system.