MLCommons Algorithmic Efficiency is a benchmark and competition measuring neural network training speedups due to algorithmic improvements in both training algorithms and models.
This PR fixes this issue. To properly do so, it also needs to load the list of held-out workloads (to drop all other workload variants that have only been computed for the baseline).
When computing our benchmark scores, we want to "ignore" runs on a base workload, if the submission doesn't hit the target on the held-out workload. This is implemented here: https://github.com/mlcommons/algorithmic-efficiency/blob/c465e252c95521c223530b0523feaa38c6dd06e4/scoring/performance_profile.py#L322-L328 However, the
variant_criteria_filter()
only checks fornp.inf
values (https://github.com/mlcommons/algorithmic-efficiency/blob/c465e252c95521c223530b0523feaa38c6dd06e4/scoring/performance_profile.py#L245-L257). But another invalid score that can occur is anan
. This happens, e.g. when running OOM. In this case, the base workload score should also be ignored.This PR fixes this issue. To properly do so, it also needs to load the list of held-out workloads (to drop all other workload variants that have only been computed for the baseline).