Closed liehe closed 4 years ago
Good idea.
We don't really need a aggregate_gradients(model, config, agg_fn)
function if we have Aggregation
classes anyways, the Aggregation.__call__()
can just take the model as input directly.
Good idea.
We don't really need a
aggregate_gradients(model, config, agg_fn)
function if we haveAggregation
classes anyways, theAggregation.__call__()
can just take the model as input directly. You are right. We can use it directly.
looks good! best to make sure it can support aggregating both cases, models or gradients
@martinjaggi For models, do you mean we also synchronize things like moving average in Batch Normalization etc?
sometimes yes. though the more common case will be sending gradients (as then we can do sparsification etc, as you mention). for that, signSGD is a simple example
OK. @negar-foroutan already has code for Sparsified SGD. The quantized sgd would be similar. We can include both in the repo once we fix their signatures.
It's a good idea. It makes it easy to have different kinds of aggregation for both open and closed divisions.
This issue is done, right? Can this be closed?
For the moment, when we aggregate the gradients, we use something like
where the
aggregate_gradients
is an all-reduceThe communication is fixed to be an all_reduce of gradients. We may want to customize it be sparsified/quantized/decentralized aggregation.
Describe the solution you'd like
Customize the
aggregate_gradients
to allow for:updates. The
aggregate_gradients
will have one more argumentFor example, the
agg_fn
can be a subclass of:For decentralized case it can be
And for sparsified case
etc.