Update the open dataset requirement

github-actions[bot] commented 1 year ago

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

psyhtest commented 1 year ago

Justifying the removed text

From v3.0, if a submitter provides any results with any models trained on a pre-approved dataset, the submitter must also provide at least one result with the corresponding Closed model trained (or finetuned) on the same pre-approved dataset, and instructions to reproduce the training (or finetuning) process.

I recall we introduced this specifically for RetinaNet just before introducing it in v2.1. At the time, the RetinaNet dataset, an MLPerf subset of OpenImages, was only used for benchmarking one and only one model, namely the MLPerf variant of RetinaNet. Therefore, we would miss out on objectively benchmarking other research Object Detection models, typically trained and validated on the COCO dataset. The idea was that a potential submitter would finetune RetinaNet on COCO too and thus provide a useful baseline figure for any comparisons on the alternative dataset.

We at KRAI actually did this for v2.1, measuring mAP=35.293% and publishing the finetuned model. This accuracy is lower than that of the reference model on OpenImages (mAP=37.55%), but much higher than, say, that of the deprecated SSD-ResNet34 model (mAP=20.00%). So a submitter showcasing their highly optimized SSD-ResNet34 implementation could legitimately claim that it is faster than RetinaNet, albeit less accurate.

This is not fool-proof, however. A submitter could spend minimal effort on finetuning (or not at all), presenting, for example, that RetinaNet achieves only mAP=10% on the COCO dataset. Then they could misleadingly claim that their optimized SSD-ResNet34 implementation is both faster than RetinaNet and more accurate.

Justifying the added text

When seeking such pre-approval, it is recommended that a potential submitter convincingly demonstrates the accuracy of the corresponding Closed model on the same validation dataset, which may involve retraining or finetuning the Closed model if required.

This is intended to avoid the above situation. At least, such a submitter would face scrutiny from the WG in the pre-approval stage :). They may get away with handwaving it through though :).

nv-ananjappa commented 1 year ago

@psyhtest This is perfect. Covers everything we wanted to change. LGTM.

mlcommons / inference_policies

Update the open dataset requirement #285

Justifying the removed text

Justifying the added text