Question about the dataset

congxin0920 commented 8 months ago

Hello, first of all thank you very much for your outstanding contribution! I had some problems replicating your paper results. Could you please give the source of the dataset(such as cifar10) used in the paper?

qian-qi commented 8 months ago

Thank you for your interest. The data sets can be found by google the name.

congxin0920 commented 8 months ago

Thank you for your reply. I have another question about how to evaluate the trained model. Could you please add the evaluation process?

qian-qi commented 8 months ago

We adopt the same evaluation pipeline as SCAN and you can find the code there.

congxin0920 commented 8 months ago

I'm sorry to bother you again. According to the code you gave, I failed to get the expected clustering result. I can't find the problem. The final Loss can be reduced to 1.9e-06. In addition, when calculating cluster evaluation, I used the pseudo-label cur_label predicted by the model. I noticed that the cur_label of a small batch of images would output the same category in the later training,I am very confused about this and hope to get your reply.

qian-qi commented 8 months ago

Which data set did you try and did you run the right configuration? Due to the augmentation, the final loss should be much larger than what you have observed.

congxin0920 commented 8 months ago

I used the cifar10 data set with the command "sh run_cifar10_entropy.sh 0", and here are the parameters output during training. Namespace(batch_size=128, clr=1.2, data='./data/cifar10/', data_name='cifar10', dist_backend='nccl', dist_url='tcp://localhost:1234', epochs=401, gpu=None, log='secu_entropy_cifar10', lr=0.2, min_crop=0.3, momentum=0.9, multiprocessing_distributed=True, print_freq=100, rank=0, resume= ", secu_alpha=6000.0, secu_cst='entropy', secu_dim=128, secu_dual_lr=0.1, secu_k=[10, 20, 30, 40, 50, 60, 70, 80, 90, 100], secu_lratio=0.9, secu_num_head=10, secu_num_ins=50000, secu_tau=0.2, secu_tw=0.05, secu_tx=0.05, seed=None, start_epoch=0, weight_decay=0.0001, workers=2, world_size=1)

Also, here is my code for reading the data set, aug_1 and aug_2 are not changed

if args.data_name == 'cifar10': train_dataset = torchvision.datasets.CIFAR10(root = args.data, train=True, transform = secu.loader.DoubleCropsTransform(transforms.Compose(aug_1),transforms.Compose(aug_2)), download=True) else: raise TypeError

qian-qi commented 8 months ago

Can you please post the training log for one epoch? Btw, we use the data set organized with folders, which is different from your current setting. It is better to follow our settings to reproduce the result.

congxin0920 commented 8 months ago

Thank you very much for your help! As for the dataset, I couldn't find the dataset organized with folders. Could you please share the link to download the dataset？ The log is as follows:

Epoch: [0][ 0/391] Time 2.833 ( 2.833) Data 2.164 ( 2.164) Loss 5.7213e+00 (5.7213e+00) Epoch: [0][100/391] Time 0.077 ( 0.109) Data 0.000 ( 0.022) Loss 7.1386e+00 (8.7030e+00) Epoch: [0][200/391] Time 0.077 ( 0.097) Data 0.000 ( 0.011) Loss 7.2076e+00 (7.9629e+00) Epoch: [0][300/391] Time 0.078 ( 0.092) Data 0.000 ( 0.007) Loss 7.1197e+00 (7.6941e+00) Epoch: [0][391/391] Time 0.379 ( 0.089) Data 0.000 ( 0.006) Loss 7.0986e+00 (7.5622e+00) max and min cluster size for 10-class clustering is (5474.0,4808.0) max and min cluster size for 20-class clustering is (2641.0,2363.0) max and min cluster size for 30-class clustering is (1783.0,1581.0) max and min cluster size for 40-class clustering is (1301.0,1180.0) max and min cluster size for 50-class clustering is (1062.0,931.0) max and min cluster size for 60-class clustering is (872.0,759.0) max and min cluster size for 70-class clustering is (751.0,680.0) max and min cluster size for 80-class clustering is (660.0,560.0) max and min cluster size for 90-class clustering is (584.0,503.0) max and min cluster size for 100-class clustering is (532.0,437.0) use time : 35.00101900100708

Epoch: [1][ 0/391] Time 2.304 ( 2.304) Data 2.217 ( 2.217) Loss 8.4999e+00 (8.4999e+00) Epoch: [1][100/391] Time 0.079 ( 0.101) Data 0.000 ( 0.022) Loss 3.6842e+00 (4.9914e+00) Epoch: [1][200/391] Time 0.081 ( 0.090) Data 0.000 ( 0.011) Loss 2.8767e+00 (4.0724e+00) Epoch: [1][300/391] Time 0.079 ( 0.086) Data 0.000 ( 0.008) Loss 2.8331e+00 (3.6705e+00) Epoch: [1][391/391] Time 0.064 ( 0.085) Data 0.000 ( 0.006) Loss 2.8279e+00 (3.4812e+00) max and min cluster size for 10-class clustering is (5509.0,4724.0) max and min cluster size for 20-class clustering is (2664.0,2339.0) max and min cluster size for 30-class clustering is (1779.0,1466.0) max and min cluster size for 40-class clustering is (1336.0,1143.0) max and min cluster size for 50-class clustering is (1120.0,713.0) max and min cluster size for 60-class clustering is (925.0,594.0) max and min cluster size for 70-class clustering is (789.0,558.0) max and min cluster size for 80-class clustering is (668.0,419.0) max and min cluster size for 90-class clustering is (625.0,382.0) max and min cluster size for 100-class clustering is (546.0,309.0) use time : 33.1533625125885

Epoch: [132][ 0/391] Time 2.396 ( 2.396) Data 2.304 ( 2.304) Loss 1.2900e-02 (1.2900e-02) Epoch: [132][100/391] Time 0.080 ( 0.103) Data 0.000 ( 0.023) Loss 1.2878e-02 (1.2905e-02) Epoch: [132][200/391] Time 0.081 ( 0.092) Data 0.000 ( 0.012) Loss 1.2878e-02 (1.2901e-02) Epoch: [132][300/391] Time 0.081 ( 0.088) Data 0.000 ( 0.008) Loss 1.2878e-02 (1.2896e-02) Epoch: [132][391/391] Time 0.064 ( 0.086) Data 0.000 ( 0.006) Loss 1.2877e-02 (1.2894e-02) max and min cluster size for 10-class clustering is (5509.0,4724.0) max and min cluster size for 20-class clustering is (2664.0,2339.0) max and min cluster size for 30-class clustering is (1779.0,1466.0) max and min cluster size for 40-class clustering is (1336.0,1069.0) max and min cluster size for 50-class clustering is (1120.0,713.0) max and min cluster size for 60-class clustering is (925.0,594.0) max and min cluster size for 70-class clustering is (789.0,558.0) max and min cluster size for 80-class clustering is (668.0,419.0) max and min cluster size for 90-class clustering is (625.0,382.0) max and min cluster size for 100-class clustering is (546.0,309.0)

qian-qi commented 8 months ago

Hi, please organize the data set as required by ImageFolder: https://pytorch.org/vision/main/generated/torchvision.datasets.ImageFolder.html Do not change our code before you can obtain the desired result.

congxin0920 commented 8 months ago

Thank you very much for your help. I successfully reproduced the results in the paper after changing the dataset.

qian-qi commented 8 months ago

Glad to know.

idstcv / SeCu

Question about the dataset #1