Evaluation bottle necks

mlcommons / training_policies

Issues related to MLPerf™ training policies, including rules and suggested changes

https://mlcommons.org/en/groups/training

Apache License 2.0

93 stars 66 forks source link

Evaluation bottle necks #166

Open bitfort opened 5 years ago

bitfort commented 5 years ago

Sometimes evaluation can be very time consuming and done by 3rd party code. How can we reduce the influence of 3rd party evaluation code performance on the benchmark scores and engineering burden?

bitfort commented 5 years ago

SWG Notes:

Possible solutions (brain storming, have many pros/cons):

Not time evaluation
Fewer evaluation
Provide/choose optimized implementations for evaluation
Let submitters figure out how to handle it

AI(Jacob) - Present thoughts from HPC on this next week. AI(all submitters) - This is a call for proposal :)

jbalma commented 5 years ago

Can someone provide an example of a benchmark where third-party code is used for serial evaluation and becomes a bottleneck?

I've run into this issue with the translation and image-classification benchmarks, but haven't made it far enough along in porting of the other benchmarks to know which ones are most problematic.

bitfort commented 5 years ago

SWG Notes:

We believe Maskrcnn and SSD with coco evaluation is at the top of the list.

bitfort commented 5 years ago

SWG Notes:

Long term we'd like a unified solution to this, but for v0.6 it will be up to submitters to optimization evaluation code themselves if they deem it necessary. We intend to revisit this issue in the future to reduce effort submitters have to put into evaluation.

petermattson commented 4 years ago

This is backlogged not a rec.