Open bitfort opened 4 years ago
We brought this up two weeks ago and Google specifically objected to evaluating more frequently. The 3M number was proposed by a Google engineer.
I think any change at this point is bad - some companies may have already started their submission runs since there are only 10 days left.
Suggest we defer this to 0.8.
SWG:
We had a long discussion about BERT evaluation today.
The resolution from this discussion:
We want to make sure we have the following takeaways from this conversation:
We choose a to start evaluating BERT at 3M samples under the belief that everyone would converge after 3M samples. We have evidence this is not true. We want to discuss lifting this constraint (in light of our wrong assumptions writing the rules) and evaluating every 500K samples.