Reproducing self-labeling results on CIFAR10

Lumik7 commented 4 months ago

Hi, first, thank you for this nice work and for open sourcing your code.

I was able to reproduce your results on CIFAR10 (without self-labeling) with your code base, but encountered some issues and open questions using the SCAN library to conduct the self-labeling fine-tuning experiments. It would be great if you could help me in my reproduction effort.

Question 1) SCAN uses a linear cluster head for the self-labeling, whereas SeCu uses a Projection+Prediction+ClusterHead design. Did you reuse SeCu's head during self-labeling or trained a new one based on SCANs design using only the ResNet18 backbone of SeCu? Mainly I am asking if you replaced this line in SCAN:

self.cluster_head = nn.ModuleList([nn.Linear(self.backbone_dim, nclusters) for _ in range(self.nheads)])

with a head that suits SeCu, like:

self.cluster_head = nn.Sequential( nn.Linear(self.backbone_dim, self.backbone_dim), nn.BatchNorm1d(self.backbone_dim), nn.ReLU(inplace=True), nn.Linear(self.backbone_dim, 128), nn.Linear(128, nclusters, bias=False))

Question 2) I tried both options, but receive the Mask in MaskedCrossEntropyLoss is all zeros. error message from SCANs selflabel.py script. This is already with your specified parameters and a threshold of 0.9. In the appendix of the arxiv version of SeCu you state in the Self-labeling paragraph:

Before selecting the confident instances by the prediction from the weak augmentation with a threshold of 0.9, we have a warm-up period with 10 epochs, where all instances are trained with the fixed pseudo label from the assignment of pre-trained SeCu.

I could not find the option for a warmup period in SCAN and I am not sure how exactly you implemented this. It would be great if you could clarify the self-labeling process in more detail, especially the warm-up period.

Thank you and thanks again for the great work.

Best, Lukas

qian-qi commented 3 months ago

Hi,

Thank you for your interest. We reimplement self-labeling by ourselves, which can be slightly different from that of SCAN.

A1: To make self-labeling consistent with the clustering stage, we keep the architecture with MLP. However, it is a 2-layer MLP for CIFAR-10 and the similar performance can be obtained by the original R18.

A2: Our method keeps a pseudo label for each instance, which can be accessed by the function of get_label in secu/builder.py. We use that label for warmup and optimize the logits from strong augmentation by a standard cross-entropy loss.

Lumik7 commented 3 months ago

Thank you for your response. Would it be possible that you could share your adapted script of SCAN?

If not, I will try to implement this myself and will let you know if it worked :)

qian-qi commented 3 months ago

Currently, we do not have time to clean the code of self-labeling. Please let me know if you have any further questions.

Lumik7 commented 3 months ago

Alright, thank you. I will be able to work on this again next week and will come back at you if I encounter any other issues. I will close this issue for now and reopen it in case of new questions.

Lumik7 commented 2 months ago

Hi,

just wanted to let you know that I was able to reproduce the self-labeling results of SeCu on Cifar10.

One minor problem I encountered is that I had to use Adam (with default settings for Cifar10 of SCAN) instead of SGD as described in the paper:

Besides, SGD is adopted for self-labeling with 100 epochs on small data sets and 11 epochs on ImageNet.

When I used SGD the softmax cluster probabilities where at maximum 0.40 after the warmup phase of 10 epochs. These probabilites are much lower than the threshold of 0.90 that was used by SeCu during self-labeling and resulted in the Mask in MaskedCrossEntropyLoss is all zeros error in the SCAN library. Replacing SGD with Adam did the trick and resulted in higher softmax probabilities and achieving the improved accuracy.

So I am not sure if I missed something in the implementation with SGD or it is a typo in the paper, but Adam worked fine in the end.

Thanks again for the help and all the best, Lukas

qian-qi commented 2 months ago

Thanks for your efforts. The difference may be from the implementation, e.g., lr warmup, SCAN library, etc. SGD should have the similar performance as Adam on ResNet.

Will0x6c5f commented 2 months ago

Hi,

just wanted to let you know that I was able to reproduce the self-labeling results of SeCu on Cifar10.

One minor problem I encountered is that I had to use Adam (with default settings for Cifar10 of SCAN) instead of SGD as described in the paper:

Besides, SGD is adopted for self-labeling with 100 epochs on small data sets and 11 epochs on ImageNet.

When I used SGD the softmax cluster probabilities where at maximum 0.40 after the warmup phase of 10 epochs. These probabilites are much lower than the threshold of 0.90 that was used by SeCu during self-labeling and resulted in the Mask in MaskedCrossEntropyLoss is all zeros error in the SCAN library. Replacing SGD with Adam did the trick and resulted in higher softmax probabilities and achieving the improved accuracy.

So I am not sure if I missed something in the implementation with SGD or it is a typo in the paper, but Adam worked fine in the end.

Thanks again for the help and all the best, Lukas

Hello Lukas Would you mind sharing you reproduced code, particularly the evaluation and self labeling part ? I'm also interested in reproducing the result, however I'm having trouble adopting SCAN code (See https://github.com/idstcv/SeCu/issues/5#issue-2280733391). It will be of great help !

idstcv / SeCu

Reproducing self-labeling results on CIFAR10 #3