Open kweonwooj opened 5 years ago
log
-scale speed up in decoding process with comparable bleu score with L2RDo you have the implemented version of this? I am a little lost for generating suitable dataset.
@sIncerass I find a repo which may help you(but I do not validate it though). https://github.com/levensteins-monster/insertion_transformer
@sIncerass I find a repo which may help you(but I do not validate it though). https://github.com/levensteins-monster/insertion_transformer
Thanks, I guess this must come from Jiatao.
Abstract
Insertion Transformer
, an iterative and partially autoregressive model for sequence generation based on insertion operationsDetails
Introduction
Model Adjustments from original Transformer
n
target tokens, Insertion Transformer modelsn+1
slots between each tokens. Each slot is represented by the concatenation of adjacent pair of tokens with special bos/eos tokensp(c, l)
in joint distribution or factorized distributionH, shape(T+1 x h)
is last layer of decoder andW, shape (h x C)
is softmax projection layer. covers all vocabs over all locationsp(c, l) = p(c | l) * p(l)
in conditional distributionh_l, shape(h)
is l-th row ofH
,q, shape(h)
is a learnable query vectorContextual + Mixture
leads to best performance, but the gap disappears when we useeos penalty
Training
Balanced Binary Tree
Termination Condition
end-of-slot
tokeneos
tokeneos
token, once the entire sequence is produced and all locations are empty spansTraining Differenes
Inference
Greedy Decoding
Parallel Decoding
n
can be generated in as few aslog_2_(n) + 1
stepsExperiments
transformer_base
setup trained upto 1M steps with 8 x P100 GPUsEOS penalty
: selecting EOS token only if the log-probability is at leastbeta
different (unless model is REALLY confident about eos, do not produce eos). this is because eos token is very frequent in training time.eos penalty
+knowledge distillation
data as training target and using Parallel Decoding results in improved performance on dev setParallel Decoding
Test Result
Examples of Decoding
Personal Thoughts
Link : https://arxiv.org/pdf/1902.03249.pdf Authors : Stern et al. 2019