Closed JayYangSS closed 3 years ago
The epoch_length
is set to make sure that all labeled data and unlabeled data are sampled at least once in one epoch for full data setting. However, in our experiments, we find that this parameter does not affect the performance. So we didn't change it for other experiments.
In fact, you can change it to a number as large as possible to make sure that all labeled data and unlabeled data pairs are sampled.
In your experiment, you use iter_based_runner instead of epoch_based_runner, so I think this setting is ok. I try to use epoch_based_runner to compare the performance of based model(only use labeled data) and soft-teacher(semi-supervised learning) with same number of epochs. So the comparision of your soft-teacher model and based model(supervised training with labeled data) is under the same iterations and same number of labeled data in each mini-batch?
For the partially data setting, the answer is yes. However, for the fully data setting, we run the model as long as possible to make sure the model can reach its best performance.
Thank you for your kindly reply!
Hi, I wonder why you set the
epoch_length=7330
in DistributedGroupSemiBalanceSampler? If the prameters likegpu_nums, samples_per_gpu,dataset length
change,the number of iterations in one epoch will change. I think you can setepoch_length=int(self.total_size/self.samples_per_gpu/self.num_replicas)
,