mlcommons / algorithmic-efficiency

MLCommons Algorithmic Efficiency is a benchmark and competition measuring neural network training speedups due to algorithmic improvements in both training algorithms and models.
https://mlcommons.org/en/groups/research-algorithms/
Apache License 2.0
335 stars 69 forks source link

Fix scoring bug, properly handeling `nan` values #780

Closed fsschneider closed 2 months ago

fsschneider commented 2 months ago

When computing our benchmark scores, we want to "ignore" runs on a base workload, if the submission doesn't hit the target on the held-out workload. This is implemented here: https://github.com/mlcommons/algorithmic-efficiency/blob/c465e252c95521c223530b0523feaa38c6dd06e4/scoring/performance_profile.py#L322-L328 However, the variant_criteria_filter() only checks for np.inf values (https://github.com/mlcommons/algorithmic-efficiency/blob/c465e252c95521c223530b0523feaa38c6dd06e4/scoring/performance_profile.py#L245-L257). But another invalid score that can occur is a nan. This happens, e.g. when running OOM. In this case, the base workload score should also be ignored.

This PR fixes this issue. To properly do so, it also needs to load the list of held-out workloads (to drop all other workload variants that have only been computed for the baseline).

github-actions[bot] commented 2 months ago

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅