Purpose of using 1D convolutions

thuml / Anomaly-Transformer

About Code release for "Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy" (ICLR 2022 Spotlight), https://openreview.net/forum?id=LzQQ89U1qm_

MIT License

758 stars 199 forks source link

self.attention = attention self.conv1 = nn.Conv1d(in_channels=d_model, out_channels=d_ff, kernel_size=1) self.conv2 = nn.Conv1d(in_channels=d_ff, out_channels=d_model, kernel_size=1) self.norm1 = nn.LayerNorm(d_model) self.norm2 = nn.LayerNorm(d_model)

Hello,

Could you explain me the purpose of using 1D convolutions in the encoder layer?

self.attention = attention
self.conv1 = nn.Conv1d(in_channels=d_model, out_channels=d_ff, kernel_size=1)
 self.conv2 = nn.Conv1d(in_channels=d_ff, out_channels=d_model, kernel_size=1)
 self.norm1 = nn.LayerNorm(d_model)
self.norm2 = nn.LayerNorm(d_model)

Seems that now many implementations use Con1d as the "MLP" projection to transform the original input to latent space embeddings, especially Transformers in the recent few years. Maybe just base on claims, you could try a Linear layer and see the effects. It's also worth noting that for Conv layers, there are some weight parameter initialization methods such as Kaiming He's method, which may leads to better performance.

thuml / Anomaly-Transformer

Purpose of using 1D convolutions #52