I add self_Attn to another network , but it causes cuda error : out of memory

heykeetae / Self-Attention-GAN

Pytorch implementation of Self-Attention Generative Adversarial Networks (SAGAN)

2.53k stars 475 forks source link

I add self_Attn to another network , but it causes cuda error : out of memory #29

Open cooodeKnight opened 6 years ago

cooodeKnight commented 6 years ago

screenshot from 2018-11-27 16-58-49

xingdi1990 commented 5 years ago

I meet the same issue, reduce the input image size is one way to solve this.

houze-liu commented 5 years ago

hi! this is because your mechine(GPU) don't have enough memory to run the batch. Self-Attn layer is very sensitive to input size when it comes to memory usage(16x more if input size doubles). You can try to place self-attn in the shallow layer for generator and the deeper for discriminator, however by doing so the output quality might degrade.

ESanchezLozano commented 5 years ago

You can also follow the Non-local Neural Nets paper and apply a spatial downsampling (when the input resolution is too big) in the input before computing proj_key and proj_value. The inner product afterwards will resort the dimensions back to the original spatial resolution.

xingdi1990 commented 5 years ago

You can also follow the Non-local Neural Nets paper and apply a spatial downsampling (when the input resolution is too big) in the input before computing proj_key and proj_value. The inner product afterwards will resort the dimensions back to the original spatial resolution.

Thanks a lot for this suggestion

linwang123456 commented 4 years ago

You can also follow the Non-local Neural Nets paper and apply a spatial downsampling (when the input resolution is too big) in the input before computing proj_key and proj_value. The inner product afterwards will resort the dimensions back to the original spatial resolution.

is the spatail downsampling introduced in the non local neural nets paper?

sieu-n commented 3 years ago

@ESanchezLozano Wouldn't it be better to move the self-attention layer to the end part of the network where the image size is smaller? eg. 16x16, 8x8