I have some questions about the implementation of the relative positional encoding.
According to your implementation, the memory cost is O((H^2W^2) while the paper mentions that they optimize the memory cost to O(HW).
Besides, I have also tried your method on the semantic segmentation tasks and find it is very slow and consumes a huge amount of memory.
I am wondering whether you have improved memory and time issues.
When I think that the conventional relative position encoding is O (H ^ 2W ^ 2) because it generates HxW matrix. However, the current code is O (HW) because it generates a 1d vector of H and W.
Thanks for your project.
I have some questions about the implementation of the relative positional encoding. According to your implementation, the memory cost is O((H^2W^2) while the paper mentions that they optimize the memory cost to O(HW).
Besides, I have also tried your method on the semantic segmentation tasks and find it is very slow and consumes a huge amount of memory.
I am wondering whether you have improved memory and time issues.