Open MiuMiuMiue opened 1 year ago
I'm sorry for replying so late. Actually, the size of the features before and after the FFT operation remains the same. For a tensor of size hw, after the FFT operation, the shape will become h(w/2+1) due to the conjugate symmetry property of the FFT. In the specific implementation, this reason is taken into consideration, and we do not perform a transpose operation (which would introduce unnecessary complications). As for the conjugate operation, you can use torch.conj() to implement it. In our experiments, the conjugate operation did not have any impact on the final results, so we ultimately did not use it.
Hi, we noticed that in the paper you mentioned conjugate transpose operation when computing element-wise product between key and query. But we did not see this operation in this line
And since the conjugate transpose will also change the shape of the matrix, wouldn't this operation affect the element-wise product?
So it is quiet confusing about how you apply this operation. Can you give us some insights about this? Thanks!