mlcommons / training_policies

Issues related to MLPerf™ training policies, including rules and suggested changes
https://mlcommons.org/en/groups/training
Apache License 2.0
92 stars 66 forks source link

[HPC] Discussion about reusing results from previous rounds #502

Closed sparticlesteve closed 1 year ago

sparticlesteve commented 2 years ago

This issue is for discussing something that has come up a couple of times in our WG meetings recently. It's about whether submitters can and/or should be able to reuse (i.e. re-submit) results from a previous submission round.

Let's try to gather input on this here. Do we foresee any problems? Anything that would require clarification or adjustment in the rules?

azrael417 commented 2 years ago

The biggest issue I can think about is that when log compliance requirements change and old logs no longer being compliant with current rules. We need to define rules for what happens in these cases.

Another issue is that one could make submissions from retired systems and such results are not verifiable. In practice this is not a big issue since verification is difficult anyway, but it would require adjusting the rules in this regard.

sparticlesteve commented 2 years ago

Thanks Thorsten.

I agree that if rules/benchmarks change, it causes issues for reusing results. I'd support only allowing results reuse in cases where the submission is still compliant (i.e., not disqualified by any rules that changed). So it would work for relatively "stable" benchmarks.

Non-verifiability is another good one. I recall one idea mentioned in the meeting was that we could disallow results from retired, non-verifiable systems. Another was that we could add a new availability category like "retired", "legacy", "resubmitted", or something like that.

memani1 commented 2 years ago

Agree, We can perhaps consider reusing results for benchmarks whose reference implementations have not changed since the previous submission round and previous logs are still compliant with the latest rules for a given submission round. Clear marking of the results should be made both in case of results are valid or not.

hanyunfan commented 2 years ago

Agree. Maybe the WG Chair can run the new tag of the checker on the previous submission results, and only keep those ones passed and merge it to the new repo ahead of the new submission windows, so everyone can review it ahead. Those ones that aren't passed the new tag wouldn't be copied but they still stay in the previous result table.

nvaprodromou commented 2 years ago

Turns out there is precedence on this from MLPerf Training and the decision was to allow reuse of old results, assuming the benchmark hasn't changed. Any benchmark changes and rule updates from the current round would still apply, so your final score may change, for example if there are new RCPs that could make your submission now non-compliant.

MLCommons cannot and should not dictate when submitters do their runs. They could do it at code freeze, or even before code freeze, if they want, depending on contention with other internal parties, since there are lots of internal scheduling constraints for these very expensive machines. If a submitter wants to do a run very early, and it still complies with the current round, that submitter should not be punished.

sparticlesteve commented 2 years ago

Thanks, @nvaprodromou . We will therefore follow the Training group's precedent for the time being, unless/until we decide that HPC should have specific rules overriding that.

sparticlesteve commented 1 year ago

Closing since this was resolved in the discussions.