Propose to speed up the decoder by applying a more flexible beam search strategy whose candidate size may vary at each time step depending on the candidate score
10% speed up in beam_size=5 without loss of accuracy
Details
Beam Search is disadvantageous
Less adaptive, it expands candidates whose scores are much worse than the current best
It discards hypotheses if they are not within the best scoring candidates, even if the scores are close
Search Strategies
Relative Threshold Pruning
relative threshold compared to the best candidte
Absolute Threshold Pruning
Discard candidates less than absolute margin than best candidate
Relative Local Threshold Pruning
Consider the score of last generated word only in pruning
Max Candidates per Node
Fix the number of candidates with same history in each time step
Fan Out per Sentence
fan out : number of candidates we expand
Original BeamSearch has linear fan out
Proposed BeamSearch adaptively reduces the number of fan outs
Results
With beam_size=5, 10~13% speed improvement with using all the proposed methods
Personal Thoughts
Better decoding speed without hurting performance, this is what I've wanted!
Abstract
Details
Beam Search is disadvantageous
Search Strategies
Fan Out per Sentence
Results
Personal Thoughts
Link : https://arxiv.org/pdf/1702.01806.pdf Authors : Freitag et al. 2017