quality of the prefixes are higher than that of suffixes in NMT
method :
train R2L NMT model by inversing target data
calculate likelihood of n-best from original NMT model using R2L model
Target-to-Source NMT
translation may be inadequate in target side
method :
train T2S NMT model (reverse language order)
calculate likelihood score as well
N-gram Language Model
train n-gram LM with large monolingual corpus, select k-best LMs based on PPL calculated on newsdev2017 (char-level for Zh, word-level for En)
compute PPL for each translation candidates
Ensemble
generate n-best translation with ensemble model
get likelihood score from R2L, T2S and LM models
use k-batched MIRA to tune the weights for each features
NMT with Tagging Model
similar to placeholder mechanism
how it works : replace OOV with pre-defined tags, then translate source sentences with tags using NMT model, and recover the tags in translation based on the attention weights and a bilingual dictionary
use CRF-based NER (named entity recognize) tagger to obtain tags
Named-Entity Translation
due to most of rare words in news data being person named entities, we separately model external char-based enc-dec model trained on extracted parallel person names from the training data
all numbers greater than 5000 are replaced with number named entity for efficient and correct translation
Personal Thoughts
tagger and NE translation seems effective in practical application
T2S, R2L and LM model seems to be costly compared to its benefit on BLEU or human evaluation
Abstract
Sogou
NMT systemDetails
Model
0.0001
, batch=128, 8x M40 GPUsReranking
n-best
from original NMT model using R2L modelk-best
LMs based on PPL calculated on newsdev2017 (char-level for Zh, word-level for En)n-best
translation with ensemble modelk-batched
MIRA to tune the weights for each featuresNMT with Tagging Model
Named-Entity Translation
Personal Thoughts
Link : http://www.statmt.org/wmt17/pdf/WMT42.pdf Authors : Wang et al. 2017