winycg / CIRKD

[CVPR-2022] Official implementations of CIRKD: Cross-Image Relational Knowledge Distillation for Semantic Segmentation and implementations on Cityscapes, ADE20K, COCO-Stuff., Pascal VOC and CamVid.
174 stars 26 forks source link

GPU Memory #2

Closed TheshowN closed 2 years ago

TheshowN commented 2 years ago

How much gpu memory do these experiments need, such as 512x1024 of the cityscapes?

winycg commented 2 years ago

Hi, thanks for your attention of our work. By default, we set a batch size of 16 over 512x1024 on cityscapes trained by 8 GPUs. This occupies about 11GB per GPU. However, there have some choices to reduce GPU memory costs: (1) For mini-batch-based KD, you can adopt a pooling layer (e.g. 2x2 kernel size) to aggregate local embeddings so that the dimension of the similarity matrix could be reduced. You may uncomment the Line 96-101 in loss.py. (2) For memory-based KD, you can reduce the number of contrastive samples, such as --pixel-contrast-size. (3) The most effective way may reduce the batch size, e.g. 1 sample per GPU. In this case, you need to use tensor gather operator under distributed training and uncommit Line 92-93 in loss.py.