Cannot converge on CIFAR100

SuQinghang commented 3 months ago

Hello, Thank you very much for your code！ I'm trying to reproduce the results on CIFAR100 with configuration 50+5*10. However, directly using python main.py cannot get the acc improved:

{"task": 0, "epoch": 0, "acc": 1.28, "avg_acc": 1.28, "forgetting": 0.0, "acc_per_task": [1.28], "train_lr": 9.999999999999983e-07, "bwt": 0.0, "fwt": 0.0, "test_acc1": 1.3, "test_acc5": 9.26, "mean_acc5": -1.0, "train_loss": 0.58898, "test_loss": 4.07981, "token_mean_dist": 0.0, "token_min_dist": 0.0, "token_max_dist": 0.0} {"task": 0, "epoch": 50, "acc": 1.28, "avg_acc": 1.28, "forgetting": 0.0, "acc_per_task": [1.28], "train_lr": 0.00012079425424930666, "bwt": 0.0, "fwt": 0.0, "test_acc1": 1.3, "test_acc5": 9.26, "mean_acc5": -1.0, "train_loss": 0.14257, "test_loss": 4.07981, "token_mean_dist": 0.0, "token_min_dist": 0.0, "token_max_dist": 0.0} {"task": 0, "epoch": 100, "acc": 1.28, "avg_acc": 1.28, "forgetting": 0.0, "acc_per_task": [1.28], "train_lr": 0.00010847671483819926, "bwt": 0.0, "fwt": 0.0, "test_acc1": 1.3, "test_acc5": 9.26, "mean_acc5": -1.0, "train_loss": 0.14568, "test_loss": 4.07981, "token_mean_dist": 0.0, "token_min_dist": 0.0, "token_max_dist": 0.0}

Besides, I notice that the only half of the training data is used in a batch:

Image size is torch.Size([128, 3, 32, 32]). Task: [0] Epoch: [123] [ 0/194] eta: 0:01:51 lr: 0.000101 loss: 0.7097 (0.7097) time: 0.5759 data: 0.3890 max mem: 3384 Task: [0] Epoch: [123] [ 10/194] eta: 0:00:40 lr: 0.000101 loss: 0.2805 (0.3522) time: 0.2174 data: 0.0355 max mem: 3384 Task: [0] Epoch: [123] [ 20/194] eta: 0:00:34 lr: 0.000101 loss: 0.2095 (0.2725) time: 0.1815 data: 0.0001 max mem: 3384 Task: [0] Epoch: [123] [ 30/194] eta: 0:00:31 lr: 0.000101 loss: 0.1528 (0.2306) time: 0.1815 data: 0.0001 max mem: 3384 Task: [0] Epoch: [123] [ 40/194] eta: 0:00:29 lr: 0.000101 loss: 0.1328 (0.2045) time: 0.1815 data: 0.0001 max mem: 3384 Task: [0] Epoch: [123] [ 50/194] eta: 0:00:27 lr: 0.000101 loss: 0.1205 (0.1872) time: 0.1815 data: 0.0001 max mem: 3384 Task: [0] Epoch: [123] [ 60/194] eta: 0:00:25 lr: 0.000101 loss: 0.1105 (0.1741) time: 0.1815 data: 0.0001 max mem: 3384 Task: [0] Epoch: [123] [ 70/194] eta: 0:00:23 lr: 0.000101 loss: 0.1052 (0.1645) time: 0.1816 data: 0.0001 max mem: 3384 Task: [0] Epoch: [123] [ 80/194] eta: 0:00:21 lr: 0.000101 loss: 0.1048 (0.1573) time: 0.1815 data: 0.0001 max mem: 3384 Task: [0] Epoch: [123] [ 90/194] eta: 0:00:19 lr: 0.000101 loss: 0.1038 (0.1513) time: 0.1814 data: 0.0001 max mem: 3384 Task: [0] Epoch: [123] Total time: 0:00:18 (0.0931 s / it) Averaged stats: lr: 0.000101 loss: 0.1039 (0.1483)

Can you give me some advice? Thank you very much!

915067906 commented 2 months ago

I found the same problem, it seems that the code given by the author needs to load the pre-trained model

scok30 commented 2 months ago

Thank you for your interest in our work. To improve the usability of our code, we will integrate it into the widely used non-exemplar class-incremental learning repository within the next two weeks. We hope this will address your concerns. Please be patient, and thank you.

scok30 / vit-cil

Cannot converge on CIFAR100 #1