Sogou Neural Machine Translation Systems for WMT17

Abstract

describe Sogou NMT system
- 1st place in Chinese -> English
- 3rd place in English -> Chinese in WMT17

Target Right-to-Left NMT Model
- quality of the prefixes are higher than that of suffixes in NMT
- method :
- train R2L NMT model by inversing target data
- calculate likelihood of n-best from original NMT model using R2L model
Target-to-Source NMT
- translation may be inadequate in target side
- method :
- train T2S NMT model (reverse language order)
- calculate likelihood score as well
N-gram Language Model
- train n-gram LM with large monolingual corpus, select k-best LMs based on PPL calculated on newsdev2017 (char-level for Zh, word-level for En)
- compute PPL for each translation candidates
Ensemble
- generate n-best translation with ensemble model
- get likelihood score from R2L, T2S and LM models
- use k-batched MIRA to tune the weights for each features

similar to placeholder mechanism
how it works : replace OOV with pre-defined tags, then translate source sentences with tags using NMT model, and recover the tags in translation based on the attention weights and a bilingual dictionary
- use CRF-based NER (named entity recognize) tagger to obtain tags

due to most of rare words in news data being person named entities, we separately model external char-based enc-dec model trained on extracted parallel person names from the training data
all numbers greater than 5000 are replaced with number named entity for efficient and correct translation

tagger and NE translation seems effective in practical application
T2S, R2L and LM model seems to be costly compared to its benefit on BLEU or human evaluation