mlcommons / training

Reference implementations of MLPerf™ training benchmarks
https://mlcommons.org/en/groups/training
Apache License 2.0
1.57k stars 548 forks source link

Add `distribution_strategy` and `all_reduce_alg` flags to TensorFlow BERT pretraining #745

Open rapsealk opened 3 weeks ago

rapsealk commented 3 weeks ago

Hello mlcommons team!

I've noticed that some utilities, such as additional flags, are missing in BERT pretraining, unlike ResNet50 image classification. This pull request is expected to be helpful to run BERT training on distributed environment.

References are below: https://github.com/mlcommons/training/blob/f0a7d0cd2e9fa198ad7cd53ee68e7be47495127e/image_classification/tensorflow2/tf2_common/utils/flags/_base.py#L140-L150 https://github.com/mlcommons/training/blob/f0a7d0cd2e9fa198ad7cd53ee68e7be47495127e/image_classification/tensorflow2/tf2_common/utils/flags/_performance.py#L224-L234

Refs

github-actions[bot] commented 3 weeks ago

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅