Open lyricgoal opened 3 years ago
Your batch size is too large. Try reducing it to 10 for example. I am also trying to resolve a similar issue at the moment.
Self-attention is notoriously memory hungry. Try reducing batch size, or applying the self-attention layers more selectively.
The fact that you are attempting to allocate 260 GiB tells me that you're are perhaps using 3D inputs? In this case, self-attention will simply not work. You will have to look into implementations with linear complexity.
I had the same issue, changing the batch size to 8 worked for me
In models.networks.py, energy = torch.bmm(proj_query.permute, proj_key) RuntimeError: CUDA out of memory. Tried to allocate 268.21 GiB (GPU 4; 10.92 GiB total capacity; 1.80 GiB already allocated; 8.52 GiB free; 47.59 MiB cached) Could you please give me some advice?