Question about epoch_length in DistributedGroupSemiBalanceSampler?

microsoft / SoftTeacher

Semi-Supervised Learning, Object Detection, ICCV2021

MIT License

904 stars 123 forks source link

Question about epoch_length in DistributedGroupSemiBalanceSampler? #92

Closed JayYangSS closed 3 years ago

JayYangSS commented 3 years ago

Hi, I wonder why you set the epoch_length=7330 in DistributedGroupSemiBalanceSampler? If the prameters likegpu_nums, samples_per_gpu，dataset length change，the number of iterations in one epoch will change. I think you can set epoch_length=int(self.total_size/self.samples_per_gpu/self.num_replicas),

MendelXu commented 3 years ago

The epoch_length is set to make sure that all labeled data and unlabeled data are sampled at least once in one epoch for full data setting. However, in our experiments, we find that this parameter does not affect the performance. So we didn't change it for other experiments. In fact, you can change it to a number as large as possible to make sure that all labeled data and unlabeled data pairs are sampled.

JayYangSS commented 3 years ago

In your experiment, you use iter_based_runner instead of epoch_based_runner, so I think this setting is ok. I try to use epoch_based_runner to compare the performance of based model(only use labeled data) and soft-teacher(semi-supervised learning) with same number of epochs. So the comparision of your soft-teacher model and based model(supervised training with labeled data) is under the same iterations and same number of labeled data in each mini-batch?

MendelXu commented 3 years ago

For the partially data setting, the answer is yes. However, for the fully data setting, we run the model as long as possible to make sure the model can reach its best performance.

JayYangSS commented 3 years ago

Thank you for your kindly reply！