Reproducing the results on CIFAR100

vlfom commented 3 years ago

Dear authors,

Thank you for your great work and clean code.

I was trying to reproduce the results from the paper (w/ HNG) on CIFAR10 and CIFAR100 datasets, however, I achieved significantly lower results on CIFAR100:

CIFAR10	CIFAR100
93.4±0.2%	81.6±0.8%

CIFAR10 results match the paper's nearly exactly, while the CIFAR100 scores reported in the paper are 86.6±0.4% - 5% higher.

To obtain the results, I ran the exact commands (w/ HNG) from the ReadMe 3 times. The training on CIFAR100 was pretty stable, but I picked the peak results anyway.

Just for reference, the final command that the ncl_hng_cifar100.sh gets executed is: ncl_cifar.py --dataset_root ./data/datasets/CIFAR/ --exp_root ./experiments --warmup_model_dir ./data/experiments/pretrained/supervised_learning/resnet_rotnet_cifar100.pth --lr 0.1 --gamma 0.1 --weight_decay 1e-4 --step_size 170 --batch_size 128 --epochs 200 --rampup_length 150 --rampup_coefficient 50 --num_labeled_classes 80 --num_unlabeled_classes 20 --dataset_name cifar100 --seed 3 --model_name resnet_cifar100_ncl_hng --mode train --hard_negative_start 3 --bce_type cos

Could you please check that the script provided for CIFAR100 is correct? E.g. I noticed that the utils/ramps.py was missing from the RankStats repository, so maybe the pushed version misses some final changes.

zhunzhong07 commented 3 years ago

Hi @vlfom , thanks for your interest in our work.

Thanks for pointing it out. We have included ramps.py in utilts now.
I have re-run the code with the same of the GitHub, and I produce 'Test acc 0.8672, nmi 0.8056, ari 0.7527' for the last epoch. Could you try it again? If it still produces lower results, you can try to use different seeds, and let's see how will it change.

vlfom commented 3 years ago

Thank you for the prompt reply!

The results that you report look like scoring on the train split to me. The real test inference happens only in the very last line of ncl_cifar.py: https://github.com/zhunzhong07/NCL/blob/main/ncl_cifar.py#L225.

This is the output of the last lines that I get when I clone your repository and just execute the ncl_hng_cifar100.sh:

You can see that on the test split I get 78%. The random seed in this run was not the best one. On average I get higher 81.6±0.8%, as reported above.

Because the seeds are manually set, I am not sure how your score of 0.8672 can be achieved on the test data after cloning this repository. However, I also want to add that your result of 86.7% matches my training results - i.e. on the training dataset, the accuracy is 86.6±0.4% as reported, very similar to yours.

However, for the DTC, RankStats, etc, methods in Table 4 mentions scores on the test split, so the comparison would be unfair if this is the case.

Please correct me if I'm wrong, and thanks a lot for your time.

JosephKJ commented 3 years ago

Hi @vlfom : None of the existing methods test on the test set of the unlabelled pool. They give out the numbers on the unlabelled pool, which indeed is the train split of the unlabelled pool.

You can get this clarified from this ICCV 2021 paper too: "Note that as no supervision is used for unlabelled data, the same data are used for both training and evaluation following standard practice [23, 16]." (Section 4, 1st para of https://arxiv.org/abs/2104.12673)

@zhunzhong07 : Please correct me if I am wrong.

vlfom commented 3 years ago

I see, indeed, thanks a lot for clarifying! @JosephKJ

zhunzhong07 commented 3 years ago

Hi @vlfom : None of the existing methods test on the test set of the unlabelled pool. They give out the numbers on the unlabelled pool, which indeed is the train split of the unlabelled pool.

You can get this clarified from this ICCV 2021 paper too: "Note that as no supervision is used for unlabelled data, the same data are used for both training and evaluation following standard practice [23, 16]." (Section 4, 1st para of https://arxiv.org/abs/2104.12673)

@zhunzhong07 : Please correct me if I am wrong.

@JosephKJ You are exactly right. Thanks for your clarification!

@vlfom For results on the test set, please also refer to our work in ICCV 21 https://arxiv.org/abs/2104.12673.

zhunzhong07 / NCL

Reproducing the results on CIFAR100 #3