The Implementation result of pretext task + kmeans

forrestsz commented 3 years ago

Hi, Thank for your nice work. It inspired me most. I want to implement the result of pretext task + kmeans on CIFAR10 (65% ACC in paper) First, I download the checkpoint in here: https://drive.google.com/file/d/1Cl5oAcJKoNE5FSTZsBSAKLcyA5jXGgTT/view Then, I add some code in the eval.py as follow:

        print('Fill Memory Bank')
        fill_memory_bank(dataloader, model, memory_bank)

        if not args.simclr_kmeans:
            print('Mine the nearest neighbors')
            for topk in [1, 5, 20]: # Similar to Fig 2 in paper
                _, acc = memory_bank.mine_nearest_neighbors(topk)
                print('Accuracy of top-{} nearest neighbors on validation set is {:.2f}'.format(topk, 100*acc))
        else:
            head = 0
            print(memory_bank.features.cpu().shape)
            kmeans = KMeans(n_clusters=config['num_classes'], random_state=0).fit(memory_bank.features.cpu())
            kmeans = torch.from_numpy(kmeans.labels_).cuda()
            predictions=[{'predictions':kmeans,'probabilities':1, 'targets':memory_bank.targets}]
            clustering_stats = hungarian_evaluate_me(head, predictions, dataset.classes,
                                                  compute_confusion_matrix=True)
            print(clustering_stats)

But i get the following result, which is far lower than the 65% in paper

{'ACC': 0.3647, 'ARI': 0.13848755246278868, 'NMI': 0.2627059928586838, 'hungarian_match': [(0, 2), (1, 1), (2, 8), (3, 3), (4, 5), (5, 9), (6, 0), (7, 4), (8, 6), (9, 7)]}

Maybe I make some mistake in the calculation, can you tell me where I am wrong? Great thank for your time!

wvangansbeke commented 3 years ago

Hi @linqinghong,

Thank you for your interest. There are indeed a few issues.

It looks that you are immediately clustering the validation set with KMeans. Notice that it is better to apply the KMeans algorithm to the training features first and perform the fit method on the validation set afterwards. The training set is much larger which will result in a better coverage of the feature space (and therefore better results).
You should also use the L2-normalized features before the MLP head. For a ResNet-18, the dimension of the features should be 512. These features are better disentangled than the ones after the MLP head.
I experimented with MiniBatchKMeans (CPU) and Faiss (GPU), but the final results are more or less the same.

Hope this helps.

forrestsz commented 3 years ago

Thank for your reply, I will try your suggestion as soon! Thank!

wvangansbeke commented 3 years ago

OK. Please reach out if something goes wrong. Closing this issue for now.

cag472 commented 3 years ago

I am having this same issue.

My independent implementation reproduced the 36.5% accuracy that linqinghong reported above.
After removing the MLP head and doing kmeans.fit on the training dataset as you advised, I am up to 45.0% accuracy.
I have also tried the following, to no avail:
- using my own implementation of MoCo rather than your pretrained one
- using MiniBatchKMeans rather than KMeans

Thanks for any advice

07Agarg commented 3 years ago

Hi @cag472,

I was also getting around 45.0% K-means clustering accuracy on training features before MLP head.

But it got improved to around 65% ACC when I used l2-normalized training features before MLP head. I would recommend trying out l2-normalization.

Thanks.

TsungWeiTsai commented 3 years ago

Using Spherical k-means would provide slightly better results when applying to training+testing set. Please kindly refer to ICLR 2021 MiCE

wvangansbeke / Unsupervised-Classification

The Implementation result of pretext task + kmeans #49