Open kweonwooj opened 6 years ago
In short, Transformer is Self-Attention + Positional Encoding + Multi-head Attention
Multi-head Attention + FFN
Proposed Multi-branch Attention
Weighted Transformer in Graph
Training Details
Result
Parameter Search Result
Regularization Effect shown via viz
(κ,α) Trend during training
Gating
Link : https://openreview.net/forum?id=SkYMnLxRW¬eId=SkYMnLxRW Authors : Anonymous
Abstract
Details
In short, Transformer is Self-Attention + Positional Encoding + Multi-head Attention
Multi-head Attention + FFN
Proposed Multi-branch Attention
Weighted Transformer in Graph
Training Details
Result
Parameter Search Result
Regularization Effect shown via viz
(κ,α) Trend during training
Gating
Personal Thoughts
Link : https://openreview.net/forum?id=SkYMnLxRW¬eId=SkYMnLxRW Authors : Anonymous