mlcommons / training

Reference implementations of MLPerf™ training benchmarks
https://mlcommons.org/en/groups/training
Apache License 2.0
1.62k stars 560 forks source link

Add `distribution_strategy` and `all_reduce_alg` flags to TensorFlow BERT pretraining #745

Open rapsealk opened 5 months ago

rapsealk commented 5 months ago

Hello mlcommons team!

I've noticed that some utilities, such as additional flags, are missing in BERT pretraining, unlike ResNet50 image classification. This pull request is expected to be helpful to run BERT training on distributed environment.

References are below: https://github.com/mlcommons/training/blob/f0a7d0cd2e9fa198ad7cd53ee68e7be47495127e/image_classification/tensorflow2/tf2_common/utils/flags/_base.py#L140-L150 https://github.com/mlcommons/training/blob/f0a7d0cd2e9fa198ad7cd53ee68e7be47495127e/image_classification/tensorflow2/tf2_common/utils/flags/_performance.py#L224-L234

Refs

github-actions[bot] commented 5 months ago

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅