In-network aggregation - Githubissues

In-network aggregation has been proposed as a promising way to accelerate this collective operation, and thus distributed training [2, 27, 31, #74, 57, #77, #76, #78]. In-network aggregation performs the “reduce” (i.e., sum) step of all-reduce in a network switch on the fly. This offers higher throughput and lower latency than a parameter server approach, where both the network link and host-side network stack can become bottlenecks. Compared to ring-based and other distributed all-reduce algorithms, in-network aggregation requires exchanging fewer messages, again reducing latency and network usage.

ネットワーク内アグリゲーションは、この集団演算、ひいては分散トレーニングを高速化する有望な方法として提案されている[2, 27, 31, #74, 57, #77, #76, #78]。ネットワーク内アグリゲーションは、ネットワークスイッチ内のall-reduceの「reduce」（すなわち合計）ステップをオンザフライで実行する。これは、ネットワークリンクとホスト側のネットワークスタックの両方がボトルネックになる可能性があるパラメータサーバーアプローチよりも、高いスループットと低いレイテンシを提供する。リングベースや他の分散型all-reduceアルゴリズムと比較して、ネットワーク内集約は、より少ないメッセージを交換する必要があり、再びレイテンシとネットワーク使用量を削減します。

[27] N. Gebara, P. Costa, and M. Ghobadi. In-network aggregation for shared machine learning clusters. In Proceedings of the 4th MLSys confrence (MLSys’21), Virtual Event, Apr. 2021
[57] B. Klenk, N. Jiang, G. Thorson, and L. Dennison. An in-network architecture for accelerating shared-memory multiprocessor collectives. In Proceedings of the 47th International Symposium on Computer Architecture (ISCA’20), Virtual Event, May 2020.

nariaki3551 / library

In-network aggregation #85

86