leaderj1001 / Stand-Alone-Self-Attention

Implementing Stand-Alone Self-Attention in Vision Models using Pytorch
MIT License
456 stars 83 forks source link

Large memory consumption #6

Closed skmhrk1209 closed 4 years ago

skmhrk1209 commented 4 years ago

Hi, thanks for your nice work. On a large dataset like ImageNet, proposed self-attention mechanism consumes large amounts of memory because of unfold operation (im2col). One would want to share point-wise feature vector among sliding windows. Do you have any idea?

leaderj1001 commented 4 years ago

Thanks for comments. I didn't think of a more efficient way. I will think about the effective self-attention mechanism. If you have a good way, send pull request.

Thank you.

skmhrk1209 commented 4 years ago

Got it. Thanks.