Try for t5-large
Evaluate results and speed on WDSQ and MINTAK
Amos: An Adam-style Optimizer with Adaptive Weight Decay towards Model-Oriented Scale
Amos is a new optimizer that we propose to pre-train large language models. It is more efficient and converges faster than AdamW: ≤ 51% memory for slot variables, and better valid loss within ≤ 70% training time!Amos is a new optimizer that we propose to pre-train large language models. It is more efficient and converges faster than AdamW: ≤ 51% memory for slot variables, and better valid loss within ≤ 70% training time!
Try for t5-large Evaluate results and speed on WDSQ and MINTAK
Amos: An Adam-style Optimizer with Adaptive Weight Decay towards Model-Oriented Scale
Amos is a new optimizer that we propose to pre-train large language models. It is more efficient and converges faster than AdamW: ≤ 51% memory for slot variables, and better valid loss within ≤ 70% training time!Amos is a new optimizer that we propose to pre-train large language models. It is more efficient and converges faster than AdamW: ≤ 51% memory for slot variables, and better valid loss within ≤ 70% training time!
ArXiV: https://arxiv.org/abs/2210.11693