Closed Rashmini closed 3 years ago
Generally speaking, Subword regularization needs more iterations (epochs) to train, as it is seen as a data augmentation. How did you set up the learning schedule? Did you optimize it with dev data?
I used the same learning rate (increase linearly for 4000 steps and decrease proportionally to the inverse square root of the number of steps) scheduled as Vaswani et al. Trained the models for 200 epochs.
You might want to try more epochs (say 1000), or early stopping with dev data.
I'm developing a transformer based NMT system for low-resource English-Sinhala translation using a parallel corpus of 54k sentences (vocab size=5k). I experimented with BPE and unigram as subword segmentation techniques and further experimented on subword regularization and BPE-dropout.
For BPE-dropout, used l=64, p=0.1 for training and l=1, p=0 for validation and testing, as stated in the paper. However, the BLEU score was reduced by 3.8 than the original BPE model. For subword regularization, experimented with l=64,-1 and various values for alpha from 0.1 to 1. However, the best BLEU score obtained was 0.3 less than the original unigram segmented model.
Since the papers have stated that subword regularization and BPE-dropout increase the BLEU score for low-resource languages, what can be the reasons for these reduced BLEU scores? Is there any way to improve these?