kangzhiq / NNCSL

Official implementation of NNCSL
MIT License
21 stars 4 forks source link

Reproduction Problem #4

Open ANANAN0981 opened 7 months ago

ANANAN0981 commented 7 months ago

Hi!

I've followed the instructions provided in the Readme.md file and executed the original code with the provided .yaml files from your repository. Here are the commands I used:

python main.py --sel nncsl_train  --fname configs/nncsl/cifar10/cifar10_0.8%_buffer500_nncsl.yaml
python main.py --sel nncsl_train  --fname configs/nncsl/cifar10/cifar10_5%_buffer500_nncsl.yaml
python main.py --sel nncsl_train  --fname configs/nncsl/cifar10/cifar10_25%_buffer500_nncsl.yaml
python main.py --sel nncsl_train  --fname configs/nncsl/cifar100/cifar100_0.8%_buffer500_nncsl.yaml
python main.py --sel nncsl_train  --fname configs/nncsl/cifar100/cifar100_0.8%_buffer5120_nncsl.yaml
python main.py --sel nncsl_train  --fname configs/nncsl/cifar100/cifar100_5%_buffer500_nncsl.yaml
python main.py --sel nncsl_train  --fname configs/nncsl/cifar100/cifar100_25%_buffer500_nncsl.yaml

However, I noticed that the results I obtained differ from yours by varying degrees across different settings:

Data CIFAR-10 CIFAR-10 CIFAR-100 CIFAR-100 CIFAR-100 CIFAR-100 CIFAR-100
 ratio-memory 0.8%-500 5%-500 25%-500 0.8%-500 0.8%-5120 5%-500 25%-500
NNCSL 73.2±0.1 77.2±0.2 77.3±0.1 27.4+0.5 27.5±0.7 31.4+0.4 35.3+0.3
Reproduction 67.69 66.64 63.82 25.26 25.34 33.81 31.72

An anomalous phenomanon is a higher label ratio leads to lower accuracy in CIFAR-10, and CIFAR-100 with 5% and 25% label ratios. It seems that there might be some discrepancies in the parameters or configurations. To further investigate, I've attached the .log files and .yaml files to this issue. I hope these materials will provide additional insight into the problem.

Looking forward to your response.

Best wishes!

NNSCL_logs_yaml.zip

kangzhiq commented 7 months ago

Hi!

Thanks for your interest in our work!

Sorry to hear that there is a reproduction issue. The results don't seem reasonable indeed. I will have a look at your file and come back to you ASAP. Please don't worry, we will figure it our together!

Best, Zhiqi

kangzhiq commented 6 months ago

@ANANAN0981 Hi! Sorry for the late response! I was occupied by something else and couldn't get a chance to check this properly. I have identified the issue and updated the code. At the same time, I am sharing the training log (not complete yet, more files to come) here.

Could you please pull the changes to your local repository and launch the experiments again to see if they are consistent with my logs?

Thanks again for your interests in our work!

ANANAN0981 commented 6 months ago

Hi!

I am going to run the updated code and report these log files. Another problem that occurs to me is the memory buffer.

As clarified in your experiment settings, the replyed samples are all labeled. Then the memory buffers of "NNCSL_500" and "NNSCL_5120" are consistent in CIFAR-10-0.8%, since they share the same 400 labeled samples, which is less than both memory buffer sizes of the two settings. Their accuracy results also look similar. Are they supposed to be the same experiment?

If so, is the 5120 not the real memory buffer size? The real size depends on the specific dataset.

Thank you!

kangzhiq commented 6 months ago

Hi!

I am going to run the updated code and report these log files.

Great, hope everything is correct this time.

As clarified in your experiment settings, the replyed samples are all labeled. Then the memory buffers of "NNCSL_500" and "NNSCL_5120" are consistent in CIFAR-10-0.8%, since they share the same 400 labeled samples, which is less than both memory buffer sizes of the two settings. Their accuracy results also look similar. Are they supposed to be the same experiment?

Yes, your intuition is correct. We didn't fully explore the potential of larger buffer size, i.e. 5120 when the labeled data is very few, i.e. 0.8%. Some follow-up experiments we did were to simply pseudo-label unlabeled data and treat them as labeled ones to be stored in the buffer. That would significantly improve the performance. We didn't put it in the paper cause it was not the main focus of this work. 😀

If so, is the 5120 not the real memory buffer size? The real size depends on the specific dataset.

Yes, it depends.

Hope it helps!

ANANAN0981 commented 6 months ago

HI!

I have rerun the new code, and the results are as follows.

Data CIFAR-10 CIFAR-10 CIFAR-100 CIFAR-100 CIFAR-100 CIFAR-100
 ratio-memory 0.8%-500 5%-500 25%-500 0.8%-500 5%-500 25%-500
NNCSL 73.2±0.1 77.2±0.2 77.3±0.1 27.4+0.5 31.4+0.4 35.3+0.3
Reproduction 72.95 77.24 77.61 26.76 35.41 31.49

The reproduction results of CIFAR10 are almost close to yours, which is great! However, the inverse tendency of CIFAR100 with 5% and 25% label ratio is still confusing. I also attached the .log of these two settings. Hoping the supplement helps to deal with this problem. cifar100_5%_buffer500_nncsl.log

cifar100_25%_buffer500_nncsl.log

kangzhiq commented 6 months ago

Hi!

Sorry for the late response, I have been super occupied this week.

Good to hear that the cifar10 experiments are fine now!

That cifar100 results look pretty strange to be honest, I will have a look at it this week and get back to you ASAP!

kangzhiq commented 5 months ago

@ANANAN0981

Hi,

Here is the quick fix. The performance gap is due to the hyperparameters.

We can get 34.63% accuracy with 25% labeled data if we simply divide the learning rate lr and lr_cls by 2, namely lr=0.6 and lr_cls=0.06. I think with more labeled data available, the model is greedily learning the new task so forgets more previous tasks. That's why simply reducing the learning rate helps. I am attaching the training log here: c100_25%_buf500_lr.log.

For sure these hyperparams are not optimal and I am running a grid search to find better ones. I will come back to you once the experiments are done.

Hope this can be helpful. 😀

Zhiqi

ANANAN0981 commented 3 months ago

Hi,

Another problem I encountered is the inconsistency of the computation of SNN pseudo labels in paper and code implementation. As shown in the Eq(1) of SNN classifier: image

The S = [h^1, ..., h^k] are the features of support samples, and y^k is the one-hot ground truth vector associated with h^k. The code corresponding to this step is

def snn(query, supports, labels):
        """ Soft Nearest Neighbours similarity classifier """
        # Step 1: normalize embeddings
        query = torch.nn.functional.normalize(query)
        supports = torch.nn.functional.normalize(supports)

        # Step 2: gather embeddings from all workers
        supports = AllGather.apply(supports).detach()

        # Step 3: compute similarity between local embeddings
        return softmax(query @ supports.T / tau) @ labels

probs = snn(anchor_views, anchor_supports, anchor_support_labels)

where 'anchor_support_labels' corresponds to y^k. We print the 'anchor_support_labels' as follows:

tensor([[0.9100, 0.0100, 0.0100,  ..., 0.0100, 0.0100, 0.0100],
        [0.0100, 0.9100, 0.0100,  ..., 0.0100, 0.0100, 0.0100],
        [0.0100, 0.0100, 0.9100,  ..., 0.0100, 0.0100, 0.0100],
        ...,
        [0.0100, 0.0100, 0.0100,  ..., 0.9100, 0.0100, 0.0100],
        [0.0100, 0.0100, 0.0100,  ..., 0.0100, 0.9100, 0.0100],
        [0.0100, 0.0100, 0.0100,  ..., 0.0100, 0.0100, 0.9100]],
       device='cuda:0') 

and anchor_support_labels.max(1)[1]=[0,1,2,3,4,5,6,7,8,9,0,....8,9]. But the real ground truth of 'anchor_supports' is 'slabels=
tensor([5, 6, 1, 2, 0, 8, 9, 3, 7, 4, 5, 6, 1, 2, 0, 8, 9, 3, 7, 4, 5, 6, 1, 2, 0, 8, 9, 3, 7, 4, 5, 6, 1, 2, 0, 8, 9, 3, 7, 4, 5, 6, 1, 2, 0, 8, 9, 3, 7, 4, 5, 6, 1, 2, 0, 8, 9, 3, 7, 4], device='cuda:5')'. If the input of anchor_support_labels is correct, what does the 'probs' represent?

To further investigate, we changed the 'anchor_support_labels' to the variable 'olabels' in your code, which is the one-hot vector of the real ground truth. The paw module didn't work since the accuracy plunged to less than 10% in the first task of cifar100_0.8%.

This problem confused me a lot. We need to figure it out since this is vital for the Nearest-neighbor Classifier. Do you have some clue about this problem?

Looking forward to your response.