mlcommons / training

Reference implementations of MLPerf™ training benchmarks
https://mlcommons.org/en/groups/training
Apache License 2.0
1.6k stars 553 forks source link

Updating DLRM #453

Closed mnaumovfb closed 3 years ago

mnaumovfb commented 3 years ago

In this update we implement

The sparse weight update: Added an option to convert to dense or coalesce sparse gradient, depending on its size and the large_grad_threshold parameter. Using coalesced or dense updates has better numerical properties than using sparse uncoalesced weight update. The use of this option is controlled by --mlperf-coalesce-sparse-grads command line argument. facebookresearch/dlrm@2dd5acf

The gradient accumulation change: Added an option to add gradients across multiple mini-batches, therefore simulating running with a larger mini-batch size, before making an optimizer step to update the weights. This option is controlled by --mlperf-grad-accum-iter command line argument. facebookresearch/dlrm@1302c71

ps. this fixes conflicts in prior PR https://github.com/mlcommons/training/pull/449

github-actions[bot] commented 3 years ago

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅