pretext + kmeans results of ImageNet-50

wvangansbeke / Unsupervised-Classification

SCAN: Learning to Classify Images without Labels, incl. SimCLR. [ECCV 2020]

https://arxiv.org/abs/2005.12320

Other

1.38k stars 269 forks source link

pretext + kmeans results of ImageNet-50 #61

Open ZhiyuanDang opened 3 years ago

ZhiyuanDang commented 3 years ago

Hi, thanks for your excellent work.

I run the K-means algorithm (faiss-gpu based) over L2-normalized pretext features (i.e., MoCo v2 checkpoint). The result on the training dataset is around 66% ACC, but these on test dataset is around 38% ACC (L2 distance of training centroid). However, Table. 4 of the paper reports the test ACC of ImageNet-50 is around 65%.

Have you encountered this problem? What is your K-means implementation scheme?

wvangansbeke commented 3 years ago

Hi @ZhiyuanDang,

I noticed that quite a lot of people have troubles with the implementation of KMeans. I might upload a script after the NeurIPS deadline. For now, did you check the previous issues e.g., issue #49?

ZhiyuanDang commented 3 years ago

Thanks for your reply. I have checked these corresponding issues #49 before I propose this issue.

It is weird that the training and test k-means results on CIFAR-10/20 are fine, but on the ImageNet subset, these two results have a large gap.

How do you address that?

wvangansbeke commented 3 years ago

Hi @ZhiyuanDang,

You should obtain more or less the same accuracy for the train and test set on ImageNet (distributions are very similar). I always report the numbers on the test set, so I'm not really sure what the issue is. I will verify it later. It is indeed weird that you obtain such a large discrepancy between train and test set accuracies, especially since you get the correct results on CIFAR10. I guess there is still something wrong with your evaluation.

In the meantime, can you try on the complete ImageNet dataset or other datasets?

ZhiyuanDang commented 3 years ago

For the same evaluation method (from #49 ), CIFAR-10/20 and STL-10 can achieve the same train and test ACC.

However, ImageNet-50 still obtain 65% training ACC and 38% test ACC. I will report the results of ImageNet-100/200 later.

-- ImageNet-100 report 59% training ACC and 37% test ACC.

-- ImageNet-200 report 52% training ACC and 37% test ACC.

Besides that, another subset of ImageNet, ImageNet-10 reports 96% training ACC and 62% test ACC.

wvangansbeke commented 3 years ago

Hi @ZhiyuanDang,

FYI, I double checked and get the same results as in the paper.

ZhiyuanDang commented 3 years ago

Hi @wvangansbeke ,

Thanks for your reply,

Could you please release the k-mean code for reference?

Li-Hyn commented 2 years ago

Hi @wvangansbeke ,

Thanks for your reply,

Could you please release the k-mean code for reference?

@wvangansbeke @ZhiyuanDang

I'm sorry to bother you, but has there been any progress on this work after a year?

I'm also reproducing the pretext+k-means part, and I can't reproduce the accuracy in the paper at the moment, can you please explain it in details?

Could you please release the k-mean code for reference?

wvangansbeke commented 2 years ago

Make sure that you L2 normalize the features and that you report the results on the test set (use the train set to fit the kmeans). It doesn't even matter which implementation or which distance metric you use. It's all pretty close. Also have a look at our repo on Unsupervised Semantic Segmentation and check out the kmeans clustering part. It's more or less the same for classification. Honestly, the implementation is quite straightforward here. I have seen papers that were able to reproduce it. I will release it if I find the time.

Li-Hyn commented 2 years ago

Make sure that you L2 normalize the features and that you report the results on the test set (use the train set to fit the kmeans). It doesn't even matter which implementation or which distance metric you use. It's all pretty close. Also have a look at our repo on Unsupervised Semantic Segmentation and check out the kmeans clustering part. It's more or less the same for classification. Honestly, the implementation is quite straightforward here. I have seen papers that were able to reproduce it. I will release it if I find the time.

Thanks for your kind reply, I'll try the methods you mentioned!