thuml / Anomaly-Transformer

About Code release for "Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy" (ICLR 2022 Spotlight), https://openreview.net/forum?id=LzQQ89U1qm_
MIT License
758 stars 199 forks source link

Purpose of using 1D convolutions #52

Open rubbiyasultan opened 1 year ago

rubbiyasultan commented 1 year ago

Hello,

Could you explain me the purpose of using 1D convolutions in the encoder layer?

self.attention = attention
self.conv1 = nn.Conv1d(in_channels=d_model, out_channels=d_ff, kernel_size=1)
 self.conv2 = nn.Conv1d(in_channels=d_ff, out_channels=d_model, kernel_size=1)
 self.norm1 = nn.LayerNorm(d_model)
self.norm2 = nn.LayerNorm(d_model)
Leopold2333 commented 1 year ago

Hello,

Could you explain me the purpose of using 1D convolutions in the encoder layer?

self.attention = attention
self.conv1 = nn.Conv1d(in_channels=d_model, out_channels=d_ff, kernel_size=1)
 self.conv2 = nn.Conv1d(in_channels=d_ff, out_channels=d_model, kernel_size=1)
 self.norm1 = nn.LayerNorm(d_model)
self.norm2 = nn.LayerNorm(d_model)

Seems that now many implementations use Con1d as the "MLP" projection to transform the original input to latent space embeddings, especially Transformers in the recent few years. Maybe just base on claims, you could try a Linear layer and see the effects. It's also worth noting that for Conv layers, there are some weight parameter initialization methods such as Kaiming He's method, which may leads to better performance.