training_rules.adoc doesn't define what a benchmark name is, nor what a problem is, but Section 4 "Divisions" of the document implies that the benchmark name is given in the Problem column of the tables in Sections 3, 4, 9.4, and 11.
Closed division benchmarks must be referred to using the benchmark name plus the term Closed, e.g. “for the Image Classification Closed benchmark, the system achieved a result of 7.2.”
The distinction between a benchmark and a closed division model for a benchmark is held through most of the document until the Appendices. In the "Benchmark Specific Rules" the term "ResNet" is used as both a benchmark name and the name of a model. In the "Allowed Optimizers" section the Benchmarks are referred to by model names again.
I'm about to create a pull request with a possible suggestion for how to clarify things. The pull request has a couple of TODOs because there are some questions I don't have answers to. In particular:
it's not clear from the Allowed Optimizers appendix whether these restrictions are for Closed division only or Open division as well. I assume just Closed division. If I've assumed correctly that the purpose of the Allowed Optimizers section is actually to clarify a set of restrictions on closed division implementations of optimizers, then I've made a further change, of moving the information about which optimizers are allowed for each benchmark to Section 4, and reduced the table in this Appendix to a list of which implementations are allowed for each optimizer.
The Allowed Optimizers Appendix says
Analysis to support this can be found in the document "MLPerf Optimizer Review" in the MLPerf Training document area.
But I can't find that document on github. We should locate the document, put it on github somewhere, and give a working link to it.
I'm confused about what the rules are for optimizers for the Recommendation benchmark. Image classification, Object detection (light weight) and (heavy weight), and Reinforcement learning are all listed as "SGD with Momentum", while Recommendation is only listed as "SGD". Is that a correct distinction? Further the DLRM/SGD section of the table in the Appendix lists only torch.optim.SGD as an allowed Pytorch optimizer, while all the other benchmarks all list apex.optimizers.FusedSGD as well. Is that intentional or an oversight?
training_rules.adoc doesn't define what a benchmark name is, nor what a problem is, but Section 4 "Divisions" of the document implies that the benchmark name is given in the Problem column of the tables in Sections 3, 4, 9.4, and 11.
The distinction between a benchmark and a closed division model for a benchmark is held through most of the document until the Appendices. In the "Benchmark Specific Rules" the term "ResNet" is used as both a benchmark name and the name of a model. In the "Allowed Optimizers" section the Benchmarks are referred to by model names again.
I'm about to create a pull request with a possible suggestion for how to clarify things. The pull request has a couple of TODOs because there are some questions I don't have answers to. In particular:
But I can't find that document on github. We should locate the document, put it on github somewhere, and give a working link to it.