Open rubbiyasultan opened 1 year ago
Hello,
Could you explain me the purpose of using 1D convolutions in the encoder layer?
self.attention = attention self.conv1 = nn.Conv1d(in_channels=d_model, out_channels=d_ff, kernel_size=1) self.conv2 = nn.Conv1d(in_channels=d_ff, out_channels=d_model, kernel_size=1) self.norm1 = nn.LayerNorm(d_model) self.norm2 = nn.LayerNorm(d_model)
Seems that now many implementations use Con1d as the "MLP" projection to transform the original input to latent space embeddings, especially Transformers in the recent few years. Maybe just base on claims, you could try a Linear layer and see the effects. It's also worth noting that for Conv layers, there are some weight parameter initialization methods such as Kaiming He's method, which may leads to better performance.
Hello,
Could you explain me the purpose of using 1D convolutions in the encoder layer?