kweonwooj / papers

summary of ML papers I've read
319 stars 34 forks source link

Accelerating Neural Transformer via an Average Attention Network #108

Open kweonwooj opened 6 years ago

kweonwooj commented 6 years ago

Abstract

Details

Introduction

screen shot 2018-05-30 at 2 03 36 pm

new Transformer

screen shot 2018-05-30 at 2 06 23 pm

Training

Decoding

screen shot 2018-05-30 at 2 09 50 pm screen shot 2018-05-30 at 2 12 28 pm

Personal Thoughts

Link : https://arxiv.org/pdf/1805.00631.pdf Authors : Zhang et al. 2018

chqiwang commented 6 years ago

It seems like the baseline Transformer is not well optimized. What do you think?

kweonwooj commented 6 years ago

@chqiwang I agree. they do not follow the model architecture of Transformer Big and is only trained upto 100k steps. I suppose the reason is their hardware limitation, they use single GTX 1080.

I like the direction of the paper, Transformer Decoder is over-capacity and I believe decoding can be made more efficient w/o huge loss of performance.