preprocess.py: optionally stores source-to-target alignments in parse format (-alignfile, -alignvalfile)
s2sa/data.lua: if present, loads the alignments and converts them into dense format per batch
train.py: optionally creates a parallel criterion consisting of the decoder criterion (ClassNLLCriterion) with the criterion for guided alignment (MSECriterion) (--guided_alignment, --guided_alignment_weight, --guided_alignment_decay)
s2sa/models.lua: optionally exposes attention output in the decoder model
Cool! Also relevant:
http://arxiv.org/pdf/1609.04186.pdf
(the above work claims that cross entropy does slightly better than MSE for training the attention part of the model)
This request adds the implementation of the guided alignment as described in Guided Alignment Training for Topic-Aware Neural Machine Translation (Chen et al. 2016)
In summary: