Blockwise Parallel Decoding for Deep Autoregressive Models

Abstract

propose novel block-wise parallel decoding scheme to make predictions for multiple time steps in parallel and then back off to the longest prefix validated by a scoring model
apply to exsting SoTA in machine translation and image translation
achieves 2x speed up w/o loss in quality (MT)
achieves 3.3x speed up with slight loss in quality (MT)

Non-Autoregressive Decoding
- Problem : although encoding source sentence can be parallelized via self-attention, decoding target sentence is still autoregressive and hence slow and inefficient
- Fully Non-Autoregressive models (by Gu et al 2017) is difficult to train, and leads to a large loss of quality
- Discrete latent variable models (by Kaiser et al 2018) does not show SoTA quality
- Iterative refinements (by Lee et al 2018) shows impressive results, but speed up is not significant

Top-k selection : relax accept condition by allowing exact match upto top k items
Distance-based selection : in case of image, one can use distance metric d as a criteria
Minimum Block Size : to ensure minimum speedup, we can constrain at least l words to be accepted. Ablation study says that this leads to drop in performance. (min_block_size=1 is best)

pre-train Transformer base model with WMT14 EnDe data for 100k steps
modify decoder part, by extending to k output layers and fine-tune for 100k steps
- due to memory constraint, unable to use mean of k cross-entropy loss, so select one of k sub-losses uniformly as a unbiased estimate of the full loss
Knowledge distillation for smoother training

Methods
- Regular : fix the pre-trained model and train modified k output layers only
- Distillation : fix the pre-trained model and train modified k outputs with distillation
- Fine-Tuning : fine-tune the pre-trained model with modified k outputs
- Both : fine-tune the pre-trained model with modified k outputs with distillation
Result
- Combining Distillation and Fine-Tuning leads to significant improvement in speed while maintaining quality