mlcommons / training_policies

Issues related to MLPerf™ training policies, including rules and suggested changes
https://mlcommons.org/en/groups/training
Apache License 2.0
93 stars 66 forks source link

Performance measurement for short run times and DVFS #265

Open TheKanter opened 4 years ago

TheKanter commented 4 years ago

DK: Discussion with expert confirms that 5 minute warm up period would work for air-cooled system. Must find details for liquid cooled systems.

AIs:

DK to talk to liquid cooling people Vendors to talk to internal power management experts, please ask about liquid cooled in particular

bitfort commented 4 years ago

One idea is to start power measurement at the 8 or 16 chip scale -- roughly one "box". All benchmarks appear to run >5mins at this scale. We can look at other options for larger scales in the future.

petermattson commented 4 years ago

Backlog since no power in v0.7