Closed psyhtest closed 2 years ago
The discussion is not restricted to Object Detection. In fact, we will be in the same situation with Image Classification if we switch the dataset to Open Images. The crux of the problem is that ImageNet and COCO are the most popular research datasets, with hundreds if not thousands of pretrained models available. In the case of a new dataset, such as the MLPerf subset of Open Images, there initially will be only a single reference model, i.e. trained specifically on this dataset for MLPerf. That is, we risk breaking making useful comparisons with a large body of work.
When models operate on the same dataset, we can more readily establish their trade-offs. For example, MobileNet-v1 with the top1 accuracy of 71.676% is usually faster than ResNet50-v1.5 with the top1 accuracy of 76.456% when both are evaluated on the ImageNet validation dataset. The performance/accuracy tradeoff is clear. Suppose now that MobileNet-v1 exhibits the top1 accuracy of 91.234% on a different dataset. Then, the tradeoff is not clear anymore.
To provide a more useful comparison in the spirit of MLPerf, we can consider some of the following options:
The first option is likely to be prohibitive in terms of additional effort and resources. For example, Krai routinely submit about a hundred Image Classification models and a dozen of Object Detection models. The focus also gets shifted away from inference to training.
The second option only requires training or finetuning a single model, instead of many. In fact, in some cases, a pretrained model can already be widely available: for example, even if ResNet50 gets retrained on Open Images for the Closed division one day, we can still use the same tried-and-tested ResNet50 model trained on ImageNet that we have been using since v0.5. I believe NVIDIA might already have RetinaNet trained on COCO.
Hi Anton If we're relaxing constraints in open division, it might be better to do it somewhat conservatively so that people are not able to e.g. leverage MLPerf's credibility to disparage their competitors using a targeted combination of model, fine-tuning recipe, dataset, and scenario adjustment.
What kind of restrictions would be workable for you, e.g. when using a dataset/benchmark combination other than used in closed division, e.g.:
the dataset must be a commonly used dataset agreed in advance with the WG
Agree, that's the whole point. We could perhaps pre-approve datasets we used in the past: ImageNet, COCO, BraTS?
models must be taken from a public model zoo approved by the WG
This would be workable for Krai but might be too restrictive. For example, Deci have a private model zoo.
the reference model for the benchmark must be fine-tuned to the new dataset, and the fine-tuning code must be Available.
I like this in principle, but cautious about the remaining time for v2.1. Maybe request this from v3.0?
@petermattson @johntran-nv I think new datasets are allowed. Pulling in training folks to comment. I believe we have allowed new datasets and new models and that is how we got the original large language model in the last training round.
@petermattson @johntran-nv Any feedback on the comment from David on allowing new datasets for Open submission?
They're allowed in Open in Training. Personally, I believe dataset construction is an important area of innovation and would allow them but try and improve how they're disclosed on the Open results sheet. However, this is a WG decision. :-)
On Tue, Jul 12, 2022 at 4:08 PM rameshchukka @.***> wrote:
@petermattson https://github.com/petermattson @johntran-nv https://github.com/johntran-nv Any feedback on the comment from David on allowing new datasets for Open submission?
— Reply to this email directly, view it on GitHub https://github.com/mlcommons/inference_policies/issues/245#issuecomment-1182583984, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIIVUHPAOTTCJ7D7JM6KZE3VTX3GPANCNFSM5WBGTLRQ . You are receiving this because you were mentioned.Message ID: @.***>
For the upcoming v2.1 round, the old Object Detection benchmarks, based on the SSD-ResNet34 and SSD-MobileNet-v1 models and the COCO 2017 validation dataset, have been deprecated. The new Object Detection benchmark, based on the RetinaNet model and a subset of the Open Images v6 dataset.
According to the relaxed rules for the Open division:
This raises a question whether Object Detection models trained on the COCO 2017 dataset can still be used for submissions into the v2.1 round. According to 8, the answer is "yes".
However, according to 2, any such models must be evaluated on the same dataset as used by RetinaNet, i.e. the MLPerf subset of Open Images. While this can be arranged, the thus obtained accuracy figures are likely to be different from those published for the same models e.g. in the TF2 Object Detection Model Zoo.