mlcommons / training_policies

Issues related to MLPerf™ training policies, including rules and suggested changes
https://mlcommons.org/en/groups/training
Apache License 2.0
91 stars 65 forks source link

unifying training and inference implementation equivalence requirements? #260

Open christ1ne opened 4 years ago

christ1ne commented 4 years ago

The current training rules require all implementations to be mathematically equivalent to the reference except for a few cases. However, the inference rules allow optimizations as long as it is not blacklisted. Shall we make the implementation equivalence consistent across training and inference? What's the decision for training v0.7?

training: https://github.com/mlperf/training_policies/blob/master/training_rules.adoc#equivalence-exceptions

inference: https://github.com/mlperf/inference_policies/blob/master/inference_rules.adoc#82-model-equivalence

tjablin commented 4 years ago

I'm not in favor of unifying the Inference and Training rules at this time. I think Inference can function with more permissive rules that match industry practice, but Training needs more restrictive rules to avoid chaos.

DilipSequeira commented 4 years ago

NVIDIA agrees with both Intel & Google :)

I don't think we should align inference and training, in general. But I do think that the fundamental guideline for model equivalence is better as intensional (e.g. mathematical) equivalence rather than extensional ("whatever works") equivalence.

Or we say we want MLPerf to be like real deployment, where you get to fine-tune, sparsify and utilize whatever techniques are representative of how you believe customers should be deploying that network on your accelerator.

bitfort commented 4 years ago

General Thoughts:

One way of looking at it: People can look at training boxes and introspect them, people can look at inference as more of a "black box" solution. These roughly match the strict versus open ended rules. Another thought: It is significantly difficult to change the rules at this time, so this would require a lot of discussion, work, and consideration.

Perhaps it could be useful to add explanations and background to the rules. Or ways to explain why we have these rules and why we ask people to follow them.

Also, people can confuse training and inference rules.

We will discuss further in our community meeting.