yyliu01 / PS-MT

[CVPR'22] Perturbed and Strict Mean Teachers for Semi-supervised Semantic Segmentation
https://arxiv.org/pdf/2111.12903.pdf
MIT License
186 stars 17 forks source link

Questions about performance by batch size #24

Closed DeepHM closed 1 year ago

DeepHM commented 1 year ago

Hello. First of all, thank you for sharing your wonderful research!

I have some questions. I am comparing your study with the CPS study, which is a TOP3 benchmark on semi-supervised semantic segmentation (pascal VOC dataset). According to your paper and code, it seems to be done with batch-size = gpus(4)*batch_size(8) = 32. However, according to the paper and code of CPS, it was done with labeled_data: 8 batch & unlabeled_data: 8 batch. Therefore, to ensure a fair comparison, I kept all other options unchanged and trained your code with a resnet50 model, batch-size of 8 (which is equal to the product of 2 GPUs and 4 batch-size), and 80 epochs for all labeled ratios. Also, the implementation of CPS was also trained by setting the epochs to 80 for all labeled ratios.

Below is the result of my re-implementation. (CPS vs PS-MT)

1/8 1/4 1/2
Epoch80 Score 73.74 72.86 75.77
BEST Score 74.07 74.39 75.80
BEST Epoch 58 28 or 80 80
1/8 1/4 1/2
Epoch80 Score 74.09 75.406 75.651
BEST Score 74.937 75.557 75.786
BEST Epoch 67 66 78

This result shows the difference with the results of your study. Also, I think the batch-size in semi-supervised learning is much more important than the importance of batch-size in supervised-learning.

I have a few questions to ask

  1. Is there no problem with the results of my re-implementation described above?
  2. I would like to ask your opinion on the importance of batch-size in semi-supervised learning(semi-supervised semantic segmentatio). Also, I would appreciate it if you could let me know if there is anything I can refer to.
yyliu01 commented 1 year ago

Hi @DeepHM ,

First of all, CPS also utilise multi GPUs for training, where 8 batch is only for 1 GPU. They utilise 4 GPUs for VOC training, and 8 GPUs for Cityscapes training. Please see their provided training logs.

If you don't have enough hardware resources (refer to your questions), since you have to decrease the batch size, did you fine-tune (i.e., decrease) the learning rate? Given that a larger batch size can provide faster convergence, did you enlarge the training epochs when you decrease the batch size?

I believe reduce half of the batch size will not effect the final performance too much, but I can't guarantee the performance if you only utilise 8 batch size for training...

Cheers, Yuyuan

yyliu01 commented 1 year ago

I'm closing the issue. Please feel free to reopen it if you can't achieve the reported performance based on our setting.