Open cooodeKnight opened 6 years ago
I meet the same issue, reduce the input image size is one way to solve this.
hi! this is because your mechine(GPU) don't have enough memory to run the batch. Self-Attn layer is very sensitive to input size when it comes to memory usage(16x more if input size doubles). You can try to place self-attn in the shallow layer for generator and the deeper for discriminator, however by doing so the output quality might degrade.
You can also follow the Non-local Neural Nets paper and apply a spatial downsampling (when the input resolution is too big) in the input before computing proj_key and proj_value. The inner product afterwards will resort the dimensions back to the original spatial resolution.
You can also follow the Non-local Neural Nets paper and apply a spatial downsampling (when the input resolution is too big) in the input before computing proj_key and proj_value. The inner product afterwards will resort the dimensions back to the original spatial resolution.
Thanks a lot for this suggestion
You can also follow the Non-local Neural Nets paper and apply a spatial downsampling (when the input resolution is too big) in the input before computing proj_key and proj_value. The inner product afterwards will resort the dimensions back to the original spatial resolution.
is the spatail downsampling introduced in the non local neural nets paper?
@ESanchezLozano Wouldn't it be better to move the self-attention layer to the end part of the network where the image size is smaller? eg. 16x16, 8x8