Toward a full-scale neural machine translation in production: the Booking.com use case

Abstract

Empirical results on training NMT in large scale E-commerce setting by Booking.com
Covers optimization, training and evaluation

Model Architecture
- 4-layer LTSM written in Lua
- Use global attention
- Use "case" embedding feature
- 0.3 residual
- no batch size indicated
- Handles named entity by pre-processing the input, detecting NE-tag in both sentences and replacing it with placeholder and simply copying it via attention map
Optimizer
- 1M En-De dataset
- SGD vs Adam vs Adagrad vs Adadelta (1.0, 0.0002, 0.1, 1.0)
- SGD performs best
Multi-GPU
- Async vs Sync Multi-GPU
- single GPU performs best ~ opposite of our in-house result
Corpus Size
- 1M, 2.5M, 5M, 7.5M, 10M corpus ran 90M iterations
- 10M performs best after-all, with higher human eval which is latent in BLEU score (more data, the better it is)
Evaluation
- Adequacy + Fluency metric