mlcommons / training_policies

Issues related to MLPerf™ training policies, including rules and suggested changes
https://mlcommons.org/en/groups/training
Apache License 2.0
92 stars 66 forks source link

Transformer Quality Target Change #171

Open bitfort opened 5 years ago

bitfort commented 5 years ago

Note to follow up about the current transformer quality target (25->27?).

bitfort commented 5 years ago

SWG Notes:

We intend to move to the quality target to 27. There is an AI to modify (and confirm) the reference reaches the target.

bitfort commented 5 years ago

SWG Notes:

AI(Cray) - Check target quality on english to french and english to german. Related to: https://github.com/mlperf/policies/issues/175

bitfort commented 5 years ago

SWG Notes:

(English to german) Published accuracy is 28.4; not able to hit 27 at the reference batch size yet; continuing parameter searching here. We expect reference to hit 27, but with changes to learning rate / batch size.

(English to german) Google believes 27 can be hit at ~64k tokens global batch size. Above this, haven't been able to converge; but still exploring. Roughly doubles # of epochs versus 25.

(English to french) published accuracy is 43... Google has seen around 41, but on going investigation.

Continuing Cray AI. AI(Google) Explore english to french at scale (non-reference).

bitfort commented 5 years ago

SWG Notes:

We feel that variance is a concern here, especially at a target of 27. We'd like to increase accuracy, but want more information on variance to set the target.

AI(Cray & Google & CISCO) -- Do a some runs to 26 to look at variance (and provide data for 25.5 too).

jbalma commented 5 years ago

I was able to get 8x transformer reference runs in and saw convergence to 26.0 on Eng-to-Germ within 5 epochs for 5/8 runs, and within 6 epochs for remaining 3.

Here is the relevant grep from the logs:

grep "Bleu score (uncased)" mlperf_translation_fp32_run_np1_bleu26_eng_togerm*new/translation/logfile | grep ": 26" mlperf_translation_fp32_run_np1_bleu26_eng_to_germ_0_new/translation/logfile:Bleu score (uncased): 26.452380418777466 mlperf_translation_fp32_run_np1_bleu26_eng_to_germ_1_new/translation/logfile:Bleu score (uncased): 26.39443278312683 mlperf_translation_fp32_run_np1_bleu26_eng_to_germ_2_new/translation/logfile:Bleu score (uncased): 26.0280579328537 mlperf_translation_fp32_run_np1_bleu26_eng_to_germ_3_new/translation/logfile:Bleu score (uncased): 26.264476776123047 mlperf_translation_fp32_run_np1_bleu26_eng_to_germ_4_new/translation/logfile:Bleu score (uncased): 26.29130184650421 mlperf_translation_fp32_run_np1_bleu26_eng_to_germ_5_new/translation/logfile:Bleu score (uncased): 26.16676688194275 mlperf_translation_fp32_run_np1_bleu26_eng_to_germ_6_new/translation/logfile:Bleu score (uncased): 26.01703405380249 mlperf_translation_fp32_run_np1_bleu26_eng_to_germ_7_new/translation/logfile:Bleu score (uncased): 26.256629824638367

mlperf_translation_fp32_run_np1_bleu26_eng_to_germ_0_new/translation/logfile:Starting iteration 5 mlperf_translation_fp32_run_np1_bleu26_eng_to_germ_1_new/translation/logfile:Starting iteration 6 mlperf_translation_fp32_run_np1_bleu26_eng_to_germ_2_new/translation/logfile:Starting iteration 6 mlperf_translation_fp32_run_np1_bleu26_eng_to_germ_3_new/translation/logfile:Starting iteration 5 mlperf_translation_fp32_run_np1_bleu26_eng_to_germ_4_new/translation/logfile:Starting iteration 5 mlperf_translation_fp32_run_np1_bleu26_eng_to_germ_5_new/translation/logfile:Starting iteration 5 mlperf_translation_fp32_run_np1_bleu26_eng_to_germ_6_new/translation/logfile:Starting iteration 5 mlperf_translation_fp32_run_np1_bleu26_eng_to_germ_7_new/translation/logfile:Starting iteration 6

bitfort commented 5 years ago

SWG Notes:

No change to target accuracy for v0.6. We think for v0.7 we can move to target quality of 27 given more time to work on the issue.

petermattson commented 4 years ago

Active, moving to backlog.