Memory/Time Complexity of the relative positional encoding

leaderj1001 / Attention-Augmented-Conv2d

Implementing Attention Augmented Convolutional Networks using Pytorch

MIT License

643 stars 100 forks source link

Memory/Time Complexity of the relative positional encoding #12

Open PkuRainBow opened 5 years ago

PkuRainBow commented 5 years ago

Thanks for your project.

I have some questions about the implementation of the relative positional encoding. According to your implementation, the memory cost is O((H^2W^2) while the paper mentions that they optimize the memory cost to O(HW).

Besides, I have also tried your method on the semantic segmentation tasks and find it is very slow and consumes a huge amount of memory.

I am wondering whether you have improved memory and time issues.

leaderj1001 commented 5 years ago

Thanks for your comment !

memory cost
- When I think that the conventional relative position encoding is O (H ^ 2W ^ 2) because it generates HxW matrix. However, the current code is O (HW) because it generates a 1d vector of H and W.
Time issues
- I'll fix it as soon as possible. Thank you !