leaderj1001 / Stand-Alone-Self-Attention

Implementing Stand-Alone Self-Attention in Vision Models using Pytorch
MIT License
456 stars 83 forks source link

how about replacing einsum with normal multiplication #14

Open tntjd7545 opened 4 years ago

tntjd7545 commented 4 years ago

in attention.py, class AttentionConv

replacing out = torch.einsum('bnchwk,bnchwk -> bnchw', out, v_out)

with out = (out*v_out).sum(dim=5)

made running time more than 2x faster while training on IMAGENET (2 min vs 53s per 100 step, batchsize 25) which is still 3.5x slower than training normal ResNet on IMAGENET

(Not sure whether this model works for IMAGENET or not)