mlcommons / training_policies

Issues related to MLPerf™ training policies, including rules and suggested changes
https://mlcommons.org/en/groups/training
Apache License 2.0
92 stars 66 forks source link

DellEMC borrowing hparam for resnet on DSS8440 #377

Closed hanyunfan closed 4 years ago

hanyunfan commented 4 years ago
division closed closed closed (Borrowed)
submitter Inspur Dell Dell
benchmark resnet resnet resnet
system_name nf5488_mxnet DSS8440 DSS8440
number_of_nodes 1 1 1
accelerators_per_node 8 8 8
accelerator_model_name Nvidia Tesla A100 NVIDIA V100S-PCIe-32GB NVIDIA V100S-PCIe-32GB
global_batch_size 3264 1664 3264
lars_epsilon 0   0
lars_opt_base_learning_rate 10.5   10.5
lars_opt_end_learning_rate 0.0001 0.0001 0.0001
lars_opt_learning_rate_decay_poly_power 2 2 2
lars_opt_learning_rate_decay_steps 37   37
lars_opt_learning_rate_warmup_epochs 2   2
lars_opt_momentum 0.9   0.9
lars_opt_weight_decay 0.0001   0.0001
model_bn_span 208 208 208
opt_learning_rate_warmup_epochs   3  
opt_name lars sgd lars
sgd_opt_base_learning_rate   3  
sgd_opt_end_learning_rate   0.0001  
sgd_opt_learning_rate_decay_poly_power   2  
sgd_opt_learning_rate_decay_steps   42  
sgd_opt_momentum   0.9  
sgd_opt_weight_decay   2.50E-05  
hanyunfan commented 4 years ago

Performance improvement from 79 to 70.2

petermattson commented 4 years ago

Submitters: OK

petermattson commented 4 years ago

Submitters: we allow precisely this one set of hyperparameters to be borrowed through end of today. No other late borrowing.