The sparse weight update: Added an option to convert to dense or coalesce sparse gradient, depending on its size and the large_grad_threshold parameter. Using coalesced or dense updates has better numerical properties than using sparse uncoalesced weight update. The use of this option is controlled by --mlperf-coalesce-sparse-grads command line argument.
facebookresearch/dlrm@2dd5acf
The gradient accumulation change: Added an option to add gradients across multiple mini-batches, therefore simulating running with a larger mini-batch size, before making an optimizer step to update the weights. This option is controlled by --mlperf-grad-accum-iter command line argument. facebookresearch/dlrm@1302c71
In this update we implement
The sparse weight update: Added an option to convert to dense or coalesce sparse gradient, depending on its size and the large_grad_threshold parameter. Using coalesced or dense updates has better numerical properties than using sparse uncoalesced weight update. The use of this option is controlled by --mlperf-coalesce-sparse-grads command line argument. facebookresearch/dlrm@2dd5acf
The gradient accumulation change: Added an option to add gradients across multiple mini-batches, therefore simulating running with a larger mini-batch size, before making an optimizer step to update the weights. This option is controlled by --mlperf-grad-accum-iter command line argument. facebookresearch/dlrm@1302c71
ps. this fixes conflicts in prior PR https://github.com/mlcommons/training/pull/449