Considering that the query and key have a timestep of 1 from the perspective of 1d convolution, I think that it is totally same with using the fully connected layer.
Is there any reason for using the tf.layer.conv1d for query, key transformation instead of fully connected layer?
Considering that the query and key have a timestep of 1 from the perspective of 1d convolution, I think that it is totally same with using the fully connected layer.
Is there any reason for using the tf.layer.conv1d for query, key transformation instead of fully connected layer?