Dataset constraints for the Open division

psyhtest commented 2 years ago

For the upcoming v2.1 round, the old Object Detection benchmarks, based on the SSD-ResNet34 and SSD-MobileNet-v1 models and the COCO 2017 validation dataset, have been deprecated. The new Object Detection benchmark, based on the RetinaNet model and a subset of the Open Images v6 dataset.

According to the relaxed rules for the Open division:

2 - The accuracy dataset must be the same as an existing Closed benchmark. 8 - The model can be of any origin (trained on any dataset, quantized in any way, and sparsified in anyway).

This raises a question whether Object Detection models trained on the COCO 2017 dataset can still be used for submissions into the v2.1 round. According to 8, the answer is "yes".

However, according to 2, any such models must be evaluated on the same dataset as used by RetinaNet, i.e. the MLPerf subset of Open Images. While this can be arranged, the thus obtained accuracy figures are likely to be different from those published for the same models e.g. in the TF2 Object Detection Model Zoo.

psyhtest commented 2 years ago

The discussion is not restricted to Object Detection. In fact, we will be in the same situation with Image Classification if we switch the dataset to Open Images. The crux of the problem is that ImageNet and COCO are the most popular research datasets, with hundreds if not thousands of pretrained models available. In the case of a new dataset, such as the MLPerf subset of Open Images, there initially will be only a single reference model, i.e. trained specifically on this dataset for MLPerf. That is, we risk breaking making useful comparisons with a large body of work.

When models operate on the same dataset, we can more readily establish their trade-offs. For example, MobileNet-v1 with the top1 accuracy of 71.676% is usually faster than ResNet50-v1.5 with the top1 accuracy of 76.456% when both are evaluated on the ImageNet validation dataset. The performance/accuracy tradeoff is clear. Suppose now that MobileNet-v1 exhibits the top1 accuracy of 91.234% on a different dataset. Then, the tradeoff is not clear anymore.

psyhtest commented 2 years ago

To provide a more useful comparison in the spirit of MLPerf, we can consider some of the following options:

Request that any models that submitters use in the Open division are trained or finetuned on the same dataset as used by the Closed division model.
Request that the Closed division model is additionally trained or finetuned on the dataset that submitters use in the Open division, and is also submitted to the Open division on the same dataset . For example, a submitter that wishes to compare the new RetinaNet model with any COCO models must finetune it on COCO as well, and make the finetuned RetinaNet model part of their submission too.

The first option is likely to be prohibitive in terms of additional effort and resources. For example, Krai routinely submit about a hundred Image Classification models and a dozen of Object Detection models. The focus also gets shifted away from inference to training.

The second option only requires training or finetuning a single model, instead of many. In fact, in some cases, a pretrained model can already be widely available: for example, even if ResNet50 gets retrained on Open Images for the Closed division one day, we can still use the same tried-and-tested ResNet50 model trained on ImageNet that we have been using since v0.5. I believe NVIDIA might already have RetinaNet trained on COCO.

DilipSequeira commented 2 years ago

Hi Anton If we're relaxing constraints in open division, it might be better to do it somewhat conservatively so that people are not able to e.g. leverage MLPerf's credibility to disparage their competitors using a targeted combination of model, fine-tuning recipe, dataset, and scenario adjustment.

What kind of restrictions would be workable for you, e.g. when using a dataset/benchmark combination other than used in closed division, e.g.:

the dataset must be a commonly used dataset agreed in advance with the WG
models must be taken from a public model zoo approved by the WG
the reference model for the benchmark must be fine-tuned to the new dataset, and the fine-tuning code must be Available.

psyhtest commented 2 years ago

the dataset must be a commonly used dataset agreed in advance with the WG

Agree, that's the whole point. We could perhaps pre-approve datasets we used in the past: ImageNet, COCO, BraTS?

models must be taken from a public model zoo approved by the WG

This would be workable for Krai but might be too restrictive. For example, Deci have a private model zoo.

the reference model for the benchmark must be fine-tuned to the new dataset, and the fine-tuning code must be Available.

I like this in principle, but cautious about the remaining time for v2.1. Maybe request this from v3.0?

TheKanter commented 2 years ago

@petermattson @johntran-nv I think new datasets are allowed. Pulling in training folks to comment. I believe we have allowed new datasets and new models and that is how we got the original large language model in the last training round.

rnaidu02 commented 2 years ago

@petermattson @johntran-nv Any feedback on the comment from David on allowing new datasets for Open submission?

petermattson commented 2 years ago

They're allowed in Open in Training. Personally, I believe dataset construction is an important area of innovation and would allow them but try and improve how they're disclosed on the Open results sheet. However, this is a WG decision. :-)

On Tue, Jul 12, 2022 at 4:08 PM rameshchukka @.***> wrote:

@petermattson https://github.com/petermattson @johntran-nv https://github.com/johntran-nv Any feedback on the comment from David on allowing new datasets for Open submission?

— Reply to this email directly, view it on GitHub https://github.com/mlcommons/inference_policies/issues/245#issuecomment-1182583984, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIIVUHPAOTTCJ7D7JM6KZE3VTX3GPANCNFSM5WBGTLRQ . You are receiving this because you were mentioned.Message ID: @.***>

mlcommons / inference_policies

Dataset constraints for the Open division #245