mlcommons / training_policies

Issues related to MLPerf™ training policies, including rules and suggested changes
https://mlcommons.org/en/groups/training
Apache License 2.0
92 stars 66 forks source link

is unpadding allowed for BERT? #376

Open christ1ne opened 4 years ago

christ1ne commented 4 years ago

The reference has padded to a fixed sequence length. Does submitted code need to do that or we can allow unpadding?

tjablin commented 4 years ago

I think the rules allow unpadding.

Section 7.1 says, "May pad to arbitrary size (don’t be creative)." I interpret that to allow unpadded inputs.

tjablin commented 4 years ago

Wait is this a question about training or inference? If this is a training question, I retract my previous answer. If this is an inference question, it is on the wrong repository.

christ1ne commented 4 years ago

I observed at MLPerf training first. We should clarify on inference too.

bitfort commented 3 years ago

SWG:

We agree we think this is allowed but it looks like this isn't clear in the rules, we should clarify the rules here.

We should probably add wording under "9.5. Equivalence exceptions" to capture this issue.

We will revisit next week with wording to capture this issue.

bitfort commented 3 years ago

SWG:

Link to Proposal: https://docs.google.com/document/d/12grytcLPkQhU12pR0O02YtSiB87ezrVlNr9nrywCGtI/edit#

bitfort commented 3 years ago

SWG:

In summary, this proposal is pack instead of pad examples for the purposes of not processing padding and to better load balance examples across devices. There are details about how to keep mathematical equivalence, including unpacking for certain operations/layers.

Things we can clarify in the rules:

Re-write the pre-processing rules to read as "follow the reference with these exceptions"

AI(NV) look at how to craft easier to understand rules here.