Closed luofuli closed 3 years ago
1) Compared to Transformer-based models that are trained for 300K iterations with a batch size of 128 K tokens per batch, this is very small. We have not done any hyper-parameter tuning. We adapted settings from existing transformer based models. We believe that a good hyper-parameter search would further improve performance.
https://arxiv.org/pdf/1904.09324.pdf
2) wpb is words processed per batch. For more information, check Fairseq.
3) we used default settings in Fairseq for evaluation on WmT16 En-Ro dataset.
https://github.com/sacmehta/delight/blob/master/readme_files/nmt/wmt16_en2ro.md#evaluation
Thanks a million. Can you provide the data processing script for WMT16 EN-RO? Since I find that you compute the BLEU score using the processed test set in Fairseq generate.py.. I just want to use the same train/valid and the most important test data processing (such as remove-diacritics.py, normalise-romanian.py) as you but binarize with my only dictionary.
We follow the instructions in Fairseq and didn’t do anything special or different. You can also follow the same instructions.
Does fairseq provide the data pre-process scripts for EN-RO?
If you have a dataset, data processing is a standard step
1) Select number of BPE tokens 2) Learn BPE 3) Apply BPE 4) Binarize the dataset
Closing now because of inactivity. Please feel free to reopen.
Hi, I have the following three questions about the experiment of EN-RO.