Open kweonwooj opened 5 years ago
As i guess, We have GLU to expand the dimension into nx2d then we go to conv to rescale it into nxd right ? I still dont understand how to apply the softmax in LightweightConv . We would softmax all the kernel weight of the Conv layer right ?
Moreover, i am not clear about the weight-sharing of the author since i try to re-implement this architecture.
Please give me some explanation .
Thank you so much.
Abstract
lightweight convolution
anddynamic convolution
, a convolution as a function of timestep which is lightweight and cost is linear in input length + performs better or on-par with self-attention in machine translation, summarization and language modelingDetails
Background
Light-weight Convolution
H = 16
) + softmax normalized depth-wise convolutionDynamic Convolution
Overall Structure
Results
Personal Thoughts
Link : https://openreview.net/pdf?id=SkVhlh09tX Authors : Wu et al. 2018