leaderj1001 / Stand-Alone-Self-Attention

Implementing Stand-Alone Self-Attention in Vision Models using Pytorch
MIT License
456 stars 83 forks source link

Train with IMAGENET #15

Closed fityanul closed 3 years ago

fityanul commented 4 years ago

Dear @leaderj1001

Have You success to train with Imagenet Dataset? Because i found this error when use the Imagenet:

RuntimeError: CUDA out of memory. Tried to allocate 3.66 GiB (GPU 0; 10.91 GiB total capacity; 8.04 GiB already allocated; 1.66 GiB free; 8.05 GiB reserved in total by PyTorch)

Thank You

tntjd7545 commented 4 years ago

I tried, and I succeeded to initiate training using V100. However, failed to successfully train(too low accuracy)

OmerElshrief commented 4 years ago

This is because the Softmax operation takes too much memory, try to lower the dimensions before passing to Softmax

siyuan2018 commented 3 years ago

This is because the Softmax operation takes too much memory, try to lower the dimensions before passing to Softmax

Hi, can you let me know how you lower the dimension before sending it to Softmax? Thanks!