mattsherar / Temporal_Fusion_Transform

Pytorch Implementation of Google's TFT
241 stars 60 forks source link

Does this model use the original Multi-head attention? #5

Open Xanyv opened 4 years ago

Xanyv commented 4 years ago

the code in tft_model.py self.multihead_attn = nn.MultiheadAttention(self.hidden_size, self.attn_heads)

so it used the original Multi-head attention in , not the Interpretable Multi-Head Attention in TFT paper?