is unpadding allowed for BERT?

christ1ne commented 4 years ago

The reference has padded to a fixed sequence length. Does submitted code need to do that or we can allow unpadding?

tjablin commented 4 years ago

I think the rules allow unpadding.

Section 7.1 says, "May pad to arbitrary size (don’t be creative)." I interpret that to allow unpadded inputs.

tjablin commented 4 years ago

Wait is this a question about training or inference? If this is a training question, I retract my previous answer. If this is an inference question, it is on the wrong repository.

christ1ne commented 4 years ago

I observed at MLPerf training first. We should clarify on inference too.

bitfort commented 3 years ago

SWG:

We agree we think this is allowed but it looks like this isn't clear in the rules, we should clarify the rules here.

We should probably add wording under "9.5. Equivalence exceptions" to capture this issue.

We will revisit next week with wording to capture this issue.

bitfort commented 3 years ago

SWG:

Link to Proposal: https://docs.google.com/document/d/12grytcLPkQhU12pR0O02YtSiB87ezrVlNr9nrywCGtI/edit#

bitfort commented 3 years ago

SWG:

In summary, this proposal is pack instead of pad examples for the purposes of not processing padding and to better load balance examples across devices. There are details about how to keep mathematical equivalence, including unpacking for certain operations/layers.

Things we can clarify in the rules:

Wether unpadding is allowed (yes, off the clock?)
Wether packing is on or off the clock (off the clock?)
That you can use statistical information about the dataset to assist in packing (yes, optional?)
How shuffling works with packing (you can shuffle packs, also see below?)
Clarify shuffling in general; min number of units and max number of shards? (AI: ask engineers)

Re-write the pre-processing rules to read as "follow the reference with these exceptions"

AI(NV) look at how to craft easier to understand rules here.

mlcommons / training_policies

is unpadding allowed for BERT? #376