Add `distribution_strategy` and `all_reduce_alg` flags to TensorFlow BERT pretraining

Hello mlcommons team!

I've noticed that some utilities, such as additional flags, are missing in BERT pretraining, unlike ResNet50 image classification. This pull request is expected to be helpful to run BERT training on distributed environment.

References are below: https://github.com/mlcommons/training/blob/f0a7d0cd2e9fa198ad7cd53ee68e7be47495127e/image_classification/tensorflow2/tf2_common/utils/flags/_base.py#L140-L150 https://github.com/mlcommons/training/blob/f0a7d0cd2e9fa198ad7cd53ee68e7be47495127e/image_classification/tensorflow2/tf2_common/utils/flags/_performance.py#L224-L234

Refs

mlcommons / training

Add `distribution_strategy` and `all_reduce_alg` flags to TensorFlow BERT pretraining #745

384