Closed TheshowN closed 2 years ago
Hi, thanks for your attention of our work. By default, we set a batch size of 16 over 512x1024 on cityscapes trained by 8 GPUs. This occupies about 11GB per GPU. However, there have some choices to reduce GPU memory costs: (1) For mini-batch-based KD, you can adopt a pooling layer (e.g. 2x2 kernel size) to aggregate local embeddings so that the dimension of the similarity matrix could be reduced. You may uncomment the Line 96-101 in loss.py. (2) For memory-based KD, you can reduce the number of contrastive samples, such as --pixel-contrast-size. (3) The most effective way may reduce the batch size, e.g. 1 sample per GPU. In this case, you need to use tensor gather operator under distributed training and uncommit Line 92-93 in loss.py.
How much gpu memory do these experiments need, such as 512x1024 of the cityscapes?