nayeemrizve / ups

"In Defense of Pseudo-Labeling: An Uncertainty-Aware Pseudo-label Selection Framework for Semi-Supervised Learning" by Mamshad Nayeem Rizve, Kevin Duarte, Yogesh S Rawat, Mubarak Shah (ICLR 2021)
MIT License
231 stars 40 forks source link

About the unlabeled data. #3

Closed Kouuh closed 3 years ago

Kouuh commented 3 years ago

Regarding the unlabeled data set, I have a question after filtering out the positive labels. Will samples with positive labels be deleted from unlabeled data sets? Because I didn't find this step in the code.

nayeemrizve commented 3 years ago

We return four datasets from our get_cifarX function.

Ref: https://github.com/nayeemrizve/ups/blob/f003e3fcb0316b21904499ada4b65a765198fcb8/data/cifar.py#L88

For training, we use train_lbl_dataset (includes the available labeled set and the pseudo-labeled set for training with CE loss) and train_nl_dataset (includes the negatively pseudo-labeled set for training with NCE loss)

We use train_unlbl_dataset (includes all the original unlabeled samples) for generating pseudo-labels at each pseudo-label generation step. Since we do not reuse the pseudo-labels from one pseudo-labeling iteration to the next iteration, we do not delete the already generated pseudo-labels from the unlabeled set.