pip install adabelief-pytorch
pip install --editable .
sh prepare-iwslt14.sh
sh config/adabelief.sh
The BLEU score on my local machine (PyTorch 1.1, CUDA 9.0) is roughly: AdamW: 35.60 RAdam: 35.51 AdaBelief: 35.85 The result could vary with rnadomness, however they are all above 35.
When I tested AdaBelief in PyTorch 1.4 and PyTorch 1.6, the BLEU score is always below 30. Furthremore, the gradient norm in PyTorch 1.1 is always below 1.0, while with higher version PyTorch the grad explodes to 2 or more.
This seems to be a problem of the version incompatibility between fairseq
here (<=0.8) and PyTorch
.
The code here works fine with PyTorch 1.1.
When using PyTorch 1.6, AdaBelief (same code as here) works fine with latest fairseq
implementation.
Code for transformer to work with PyTorch 1.6 is at https://github.com/juntang-zhuang/fairseq-adabelief