Question about WMT EN-RO

sacmehta / delight

DeLighT: Very Deep and Light-Weight Transformers

MIT License

467 stars 53 forks source link

Question about WMT EN-RO #5

Closed luofuli closed 3 years ago

luofuli commented 4 years ago

Hi, I have the following three questions about the experiment of EN-RO.

In the paper, you mention that the batch size is 64k and training updates is 100K. But WMT 16 EN-RO is a smaller dataset which only consists of only 0.6M training examples. I wonder if training with those large batch size and training so long will not overfit.
Why is wpb=21k in https://gist.github.com/sacmehta/57c12358434f12bf15939311469c7173#file-delight_wmt16_en2ro_dm_384-txt?
Also, can you provide the evaluation script you use to compute the BLUE score of WMT EN-RO?

sacmehta commented 4 years ago

1) Compared to Transformer-based models that are trained for 300K iterations with a batch size of 128 K tokens per batch, this is very small. We have not done any hyper-parameter tuning. We adapted settings from existing transformer based models. We believe that a good hyper-parameter search would further improve performance.

https://arxiv.org/pdf/1904.09324.pdf

2) wpb is words processed per batch. For more information, check Fairseq.

3) we used default settings in Fairseq for evaluation on WmT16 En-Ro dataset.

https://github.com/sacmehta/delight/blob/master/readme_files/nmt/wmt16_en2ro.md#evaluation

luofuli commented 4 years ago

Thanks a million. Can you provide the data processing script for WMT16 EN-RO? Since I find that you compute the BLEU score using the processed test set in Fairseq generate.py.. I just want to use the same train/valid and the most important test data processing (such as remove-diacritics.py, normalise-romanian.py) as you but binarize with my only dictionary.

sacmehta commented 4 years ago

We follow the instructions in Fairseq and didn’t do anything special or different. You can also follow the same instructions.

luofuli commented 4 years ago

Does fairseq provide the data pre-process scripts for EN-RO?

sacmehta commented 4 years ago

If you have a dataset, data processing is a standard step

1) Select number of BPE tokens 2) Learn BPE 3) Apply BPE 4) Binarize the dataset

https://github.com/pytorch/fairseq/blob/9ae39465daff48650ed7956adf2980a697016d50/examples/translation/prepare-wmt14en2de.sh#L88

sacmehta commented 3 years ago

Closing now because of inactivity. Please feel free to reopen.