mlcommons / training_policies

Issues related to MLPerf™ training policies, including rules and suggested changes
https://mlcommons.org/en/groups/training
Apache License 2.0
92 stars 66 forks source link

[HPC] Proposals to increase popularity with submitters, press, and entities seeking to purchase systems #513

Open nvaprodromou opened 1 year ago

nvaprodromou commented 1 year ago

This issue is intended to be used as an index for this collection of 6 proposals we recently presented to the WG.

Introduction:

After collecting feedback from engineers, clients, and press, NVIDIA presented a list of proposals that aim to improve the popularity of the MLPerf HPC benchmark suite. Please see our slide deck for more information on our feedback gathering process and insights.

Proposal breakdown (prioritized from most urgent to least):

Timeline proposal:

We should decide at least on the top 3 proposals as soon as possible because those affect benchmark development and capacity reservations. We propose the following expedited timeline, which will likely require discussion by WG member offline through these GitHub issues.

sparticlesteve commented 1 year ago

Thanks a lot for putting together all these issues. It's really helpful.

We've had some discussions in our WG meetings but I'm still not sure we're ready to have final decisions by next week. I would suggest that we shoot for Mar 13 or 20.

I think the first two affect each other so I doubt we'll be able to decide on them fully independently, but maybe we could try to reach consensus or vote on at least the data movement one on Mar 13, and if we run out of time we can decide the throughput one on Mar 20. Does this sound reasonable?

nvaprodromou commented 1 year ago

@sparticlesteve @memani1 Should we form a task force for this so we can get some dedicated time and faster progress? Otherwise we are only opportunistically discussing the subjects whenever we find a gap in the agenda.

sparticlesteve commented 1 year ago

maybe we can discuss the taskforce and meetings here, for visibility. I'm at a conference this week but besides Fri I can make most times work if we want to meet.

nvaprodromou commented 1 year ago

I'm available to meet at any time for this, as we believe it's critical for the HPC benchmark to move towards this direction.